Tuesday, September 27, 2011

Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks

Dryad is developed at Microsoft to solve the problem of making distributed computation easier to develop.  Large scale services need large scale computation and Dryad provides a simple programming model as well as a runtime system that provides reliability, efficiency, and scalability.  Dryad forces the application developer to think about the data parallelism, and provides flexibility to design the computation.  Developers can define an arbitrary DAG and the communication mechanisms for the edges.  Dryad is more flexible because it can allow arbitrary DAGs and communication mechanisms and input sizes and types, so simpler models can be implemented on top of Dryad.

The Dryad tasks are coordinated by a job manager, which talks to the cluster nodes.  The job manager has the DAG of functions, and sends the vertices to the appropriate nodes.  The vertices communicate with each other through the data plane by the developer specified communication mechanism.  Defining a Dryad DAG is through a C++ library.  A new vertex is created by a static "factory" and edges are created through simple composition rules and graph merges.  Graphs have a distinction between input and output nodes to ensure it is an acyclic graph.  Communication mechanisms include files, shared memory, and TCP.  Encapsulation is a method to group several vertices together into one process and to run on the same node.  When the inputs to a vertex are all completed, then the vertex is added to the greedy scheduler queue.  Similar to Google's MapReduce, tasks are re-executed on failures, and some are duplicated to take care of stragglers.  Runtime local aggregation and partial aggregation optimizations are performed to reduce network traffic and improve performance.  Experiments show that for a simple SQL query, Dryad improved the speedup over SQLServer even on one machine.  The Dryad versions had nearly linear speed-up as the cluster size increased, and the in-memory version about twice as fast as the NTFS version.  For data mining workloads, the communication topology had to be optimized to improve the scalability.

Dryad provides more flexibility than Google's MapReduce, but statically defining a communication topology is difficult and burdensome.  It is nice to be able to define a DAG of operations instead of a simple map and reduce phase, but a more declarative interface will be necessary as computation gets more complicated.  If a more declarative approach is available, then there will be room for more interesting optimizations, similar to query optimizations in databases.  The future of distributed programming models should have an easy way to define the UDFs, but have a simpler way to define the DAG of data flow.  The runtime system will take care of the resource allocation in the cluster, and optimizing the DAG with respect to the cluster state and configuration.

No comments:

Post a Comment