[IPython-User] ipython parallel task depencencies
Wed Sep 25 12:40:38 CDT 2013
On Tue, Sep 24, 2013 at 10:45 PM, Peter Prettenhofer <
> I'd like to use ipython parallel for task parallelism with task
> dependencies. The tasks look as follows:
> 1: n ``fetch`` tasks are submitted.
> 2: after a fetch task is complete I submit k ``extract-transform`` tasks
> per fetch result.
> 3: Barrier that waits until all ``extract-transform`` tasks are complete
> 4: m ``load`` tasks are submitted that reduce the results of the
> ``extract-transform`` tasks.
> I'm not quite sure how this can be represented in IPython parallel. In
> particular, the dependency between 1 and 2 without a barrier between the
> two. One possibility would be to use async results and add a callback to
> the ``fetch`` task that submits the ``extract-transform`` tasks but that's
> not possible since AsyncResult does not support callbacks yet.
I think these probably can be expressed with the dependencies, but I need
more information about the relationship between tasks. Specifically, what
information needs to be conveyed from one task to the other, and what
restriction is applied to the dependent task - must it run in the same
location, or should it run anywhere, as long as it is after the earlier
1. what information do extract-transform tasks need from the fetch result?
2. how to the extract-transform tasks relate to their parent fetch tasks?
Do they want to run on the same engine, or can they run anywhere, as long
as fetch happened first?
3. Barrier is easy - asyncresult.wait and/or client.wait([list, of,
4. what is m? Is it n * k (one load task per extract task)? Do the load
tasks want to happen in the same place as the extract, or can they run
anywhere, again as long as it is later? What information does load need
> Peter Prettenhofer
> IPython-User mailing list
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the IPython-User