[IPython-dev] DAG Dependencies
Thu Oct 28 12:40:19 CDT 2010
> * optionally offload the dag directly to the underlying scheduler if it has
> dependency support (i.e., SGE, Torque/PBS, LSF)
While we could support this, I actually think it would be a step
backwards. The benefit of using IPython is the extremely high
performance. I don't know the exact performance numbers for the DAG
scheduling, but IPython has a task submission latency of about 1 ms.
This means that you can parallelize DAGs where each task is a small
fraction of a second. The submission overhead for the batch systems,
even with an empty queue is going to be orders of magnitude longer.
The other thing that impacts latency is that with a batch system you
* Serialize data to disk
* Move the data to the compute nodes or have it shared on a network file system.
* Start Python on each compute node *for each task*.
* Import all your Python modules *for each task*
* Setup global variables and data structure *for each task*
* Load data from the file system and deserialize it.
All of this means lots and lots of latency for each task in the DAG.
For tasks that have lots of data or lots of Python modules to import,
that will simply kill the parallel speedup you will get (ala Amdahl's
> * something we currently do in nipype is that we provide a configurable
> option to continue processing if a given node fails. we simply remove the
> dependencies of the node from further execution and generate a report at the
> end saying which nodes crashed.
I guess I don't see how it was a true dependency then. Is this like
an optional dependency? What are the usage cases for this.
> * callback support for node: node_started_cb, node_finished_cb
I am not sure we could support this, because once you create the DAG
and send it to the scheduler, the tasks are out of your local Python
session. IOW, there is really no place to call such callbacks.
> * support for nodes themselves being DAGs
Yes, that shouldn't be too difficult.
> * the concept of stash and pop for DAG nodes. i.e. a node which is a dag can
> stash itself while it's internal nodes execute and should not take up any
I think for the node is a DAG case, we would just flatten that at
submission time. IOW, apply the transformation:
A DAG of nodes, each of which may be a DAG => A DAG of node.
Would this work?
> also i was recently with some folks who have been using DRMAA
> (http://en.wikipedia.org/wiki/DRMAA) as the underlying common layer for
> communicating with PBS, SGE, LSF, Condor. it might be worthwhile taking a
> look (if you already haven't) to see what sort of mechanisms might help you.
> a python binding is available at:
Yes, it does make sense to support DRMAA in ipcluster. Once Min's
stuff has been merged into master, we will begin to get it working
with the batch systems again.
> On Thu, Oct 28, 2010 at 3:57 AM, MinRK <email@example.com> wrote:
>> In order to test/demonstrate arbitrary DAG dependency support in the new
>> ZMQ Python scheduler, I wrote an example using NetworkX, as Fernando
>> It generates a random DAG with a given number of nodes and edges, runs a
>> set of empty jobs (one for each node) using the DAG as a dependency graph,
>> where each edge represents a job depending on another.
>> It then validates the results, ensuring that no job ran before its
>> dependencies, and draws the graph, with nodes arranged in X according to
>> time, which means that all arrows must point to the right if the
>> time-dependencies were met.
>> It happily handles pretty elaborate (hundreds of edges) graphs.
>> Too bad I didn't have this done for today's Py4Science talk.
>> Script can be found here:
>> IPython-dev mailing list
Brian E. Granger, Ph.D.
Assistant Professor of Physics
Cal Poly State University, San Luis Obispo
More information about the IPython-dev