[IPython-User] Iterating async results?

MinRK benjaminrk@gmail....
Sun Jun 3 18:02:15 CDT 2012


On Sun, Jun 3, 2012 at 3:50 PM, Darren Govoni <darren@ontrenet.com> wrote:

> Thanks Jon. That's really helpful.
>
> The page I was looking at is:
>
> http://ipython.org/ipython-doc/rel-0.12.1/parallel/parallel_task.html#creating-a-loadbalancedview-instance
>
> and the section was "Map results are iterable!"
> That code example confused me a bit, but I understand what you have
> provided.
>

ha, sorry about that.  Apparently comments were added to the file, but the
literalinclude lines were not updated to match (probably by me).

The whole file where that example is defined:
https://github.com/ipython/ipython/blob/master/docs/examples/parallel/itermapresult.py

What the 'MapResults are iterable' means is that iterating through a
MapResult, even when the results are not yet ready, is the same as
iterating through the list of results themselves, and you do not have to
wait for all of the results to be ready to do so.

That is,

for result in mapresult:
    do_something(result)

is exactly the same as:

for result in mapresult.get():
    do_something(result)

But instead of waiting for all of the results and then iterating through
them, the results are awaited one by one as the iteration requires.



>
> Darren
>
> On Sun, 2012-06-03 at 07:32 +0000, Jon Olav Vik wrote:
> > Darren Govoni <darren <at> ontrenet.com> writes:
> >
> > > Hi,  I was looking at the docs for iterating over async results map.I
> couldn't
> > > understand the code provided in the example, because itdidn't seem to
> iterate
> > > over async results (maybe a c&p error?).
> >
> > It would help if you could refer to the code example (url?) and explain
> in more
> > detail what you expected and what happens instead.
> >
> > > So my question is how to use load
> > > balanced view to apply async map and then asynchronously iterate over
> results
> > > (as they arrive or discover results that are complete) and get those
> results.
> >
> > Hope this helps:
> >
> > First, make sure an ipcluster is running (e.g. run "ipcluster start"
> from a
> > separate command window).
> >
> > Then, paste the following into IPython. Example output is given below,
> with
> > comments.
> >
> > import time
> > import numpy as np
> > from IPython.parallel import Client
> >
> > c = Client()
> > dv = c[:]  # direct view
> > lv = c.load_balanced_view()  # load balanced view
> >
> >
> > @lv.parallel(ordered=False)
> > def func(i, x):
> >     import os
> >     import time
> >     time.sleep(x)
> >     return i, os.getpid()
> >
> > @lv.parallel()
> > def func_ordered(i, x):
> >     import os
> >     import time
> >     time.sleep(x)
> >     return i, os.getpid()
> >
> > ii = np.arange(5)
> > xx = np.r_[0.3, 0.1, 0.5, 0.4, 0.2]
> >
> > t0 = time.time()
> > print "time i  pid"
> > for i, pid in func.map(ii, xx):
> >     print "%.2f" % (time.time() - t0), i, pid
> >
> > print
> >
> > t0 = time.time()
> > print "time i  pid"
> > for i, pid in func_ordered.map(ii, xx):
> >     print "%.2f" % (time.time() - t0), i, pid
> >
> > ## -- End pasted text --
> >
> > I think the func example above does what you want. Note how process 5400
> > completes task 1 (wait for 0.1 s), then proceeds with task 4 (wait for
> 0.2 s),
> > and finishes at about the same time as process 5676 is done with task 0
> (wait
> > for 0.3 s).
> >
> > time i  pid
> > 0.11 1 5400
> > 0.31 0 5676
> > 0.32 4 5400
> > 0.42 3 7868
> > 0.52 2 5212
> >
> > However, by default load balanced views have ordered=True, meaning they
> won't
> > return a result until all the previous ones are available. Here, tasks 0
> and 1
> > both arrive after about 0.3 s. The total time is the same, though,
> showing that
> > execution is load balanced. However, if you want to watch progress, or
> have
> > post-processing that can be done in parallel, ordered=False is useful.
> >
> > time i  pid
> > 0.31 0 5676
> > 0.31 1 7868
> > 0.51 2 5400
> > 0.52 3 5212
> > 0.52 4 7868
> >
> > Note also how I used func.map(ii, xx) above, not func(ii, xx) directly.
> The
> > former will pass single items of ii and xx to the original function,
> whereas
> > the latter will pass sub-sequences of ii and xx.
> >
> > > I also want metadata (e.g. timing) for those results.
> >
> > Me too! Above I passed a task identifier (i) to keep track of what's
> happening,
> > but what I'd really want is some way to return (metadata, result) from
> the
> > iterator. Something like enumerate(iterator) might be a good syntax:
> >
> > for metadata, result in func.annotate(func.map(...)):
> >     ...
> >
> > Lots of metadata are collected:
> > http://ipython.org/ipython-doc/dev/parallel/parallel_db.html
> > though I think this is per-chunk, so if several workpieces are passed to
> an
> > engine at once (to reduce communication overhead), you won't have
> separate
> > statistics for each workpiece. I guess that's the way it must be.
> >
> > Such an iterator would also be a nice occasion to purge metadata and
> results
> > from the hub and client. Currently, it seems that needs to be done
> manually, as
> > discussed here:
> >
> > http://article.gmane.org/gmane.comp.python.ipython.user/8326
> >
> >
> > Hope this helps,
> > Jon Olav
> >
> > _______________________________________________
> > IPython-User mailing list
> > IPython-User@scipy.org
> > http://mail.scipy.org/mailman/listinfo/ipython-user
>
>
> _______________________________________________
> IPython-User mailing list
> IPython-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/ipython-user/attachments/20120603/defde0a4/attachment.html 


More information about the IPython-User mailing list