[IPython-User] Iterating async results?

Darren Govoni darren@ontrenet....
Sun Jun 3 17:50:22 CDT 2012


Thanks Jon. That's really helpful.

The page I was looking at is:
http://ipython.org/ipython-doc/rel-0.12.1/parallel/parallel_task.html#creating-a-loadbalancedview-instance

and the section was "Map results are iterable!"
That code example confused me a bit, but I understand what you have
provided.

Darren

On Sun, 2012-06-03 at 07:32 +0000, Jon Olav Vik wrote:
> Darren Govoni <darren <at> ontrenet.com> writes:
> 
> > Hi,  I was looking at the docs for iterating over async results map.I couldn't
> > understand the code provided in the example, because itdidn't seem to iterate 
> > over async results (maybe a c&p error?).
> 
> It would help if you could refer to the code example (url?) and explain in more 
> detail what you expected and what happens instead.
> 
> > So my question is how to use load 
> > balanced view to apply async map and then asynchronously iterate over results 
> > (as they arrive or discover results that are complete) and get those results. 
> 
> Hope this helps:
> 
> First, make sure an ipcluster is running (e.g. run "ipcluster start" from a 
> separate command window).
> 
> Then, paste the following into IPython. Example output is given below, with 
> comments.
> 
> import time
> import numpy as np
> from IPython.parallel import Client
> 
> c = Client()
> dv = c[:]  # direct view
> lv = c.load_balanced_view()  # load balanced view
> 
> 
> @lv.parallel(ordered=False)
> def func(i, x):
>     import os
>     import time
>     time.sleep(x)
>     return i, os.getpid()
> 
> @lv.parallel()
> def func_ordered(i, x):
>     import os
>     import time
>     time.sleep(x)
>     return i, os.getpid()
> 
> ii = np.arange(5)
> xx = np.r_[0.3, 0.1, 0.5, 0.4, 0.2]
> 
> t0 = time.time()
> print "time i  pid"
> for i, pid in func.map(ii, xx):
>     print "%.2f" % (time.time() - t0), i, pid
> 
> print
> 
> t0 = time.time()
> print "time i  pid"
> for i, pid in func_ordered.map(ii, xx):
>     print "%.2f" % (time.time() - t0), i, pid
> 
> ## -- End pasted text --
> 
> I think the func example above does what you want. Note how process 5400 
> completes task 1 (wait for 0.1 s), then proceeds with task 4 (wait for 0.2 s), 
> and finishes at about the same time as process 5676 is done with task 0 (wait 
> for 0.3 s).
> 
> time i  pid
> 0.11 1 5400
> 0.31 0 5676
> 0.32 4 5400
> 0.42 3 7868
> 0.52 2 5212
> 
> However, by default load balanced views have ordered=True, meaning they won't 
> return a result until all the previous ones are available. Here, tasks 0 and 1 
> both arrive after about 0.3 s. The total time is the same, though, showing that 
> execution is load balanced. However, if you want to watch progress, or have 
> post-processing that can be done in parallel, ordered=False is useful.
> 
> time i  pid
> 0.31 0 5676
> 0.31 1 7868
> 0.51 2 5400
> 0.52 3 5212
> 0.52 4 7868
> 
> Note also how I used func.map(ii, xx) above, not func(ii, xx) directly. The 
> former will pass single items of ii and xx to the original function, whereas 
> the latter will pass sub-sequences of ii and xx.
> 
> > I also want metadata (e.g. timing) for those results.
> 
> Me too! Above I passed a task identifier (i) to keep track of what's happening, 
> but what I'd really want is some way to return (metadata, result) from the 
> iterator. Something like enumerate(iterator) might be a good syntax:
> 
> for metadata, result in func.annotate(func.map(...)):
>     ...
> 
> Lots of metadata are collected:
> http://ipython.org/ipython-doc/dev/parallel/parallel_db.html
> though I think this is per-chunk, so if several workpieces are passed to an 
> engine at once (to reduce communication overhead), you won't have separate 
> statistics for each workpiece. I guess that's the way it must be.
> 
> Such an iterator would also be a nice occasion to purge metadata and results 
> from the hub and client. Currently, it seems that needs to be done manually, as 
> discussed here:
> 
> http://article.gmane.org/gmane.comp.python.ipython.user/8326
> 
> 
> Hope this helps,
> Jon Olav
> 
> _______________________________________________
> IPython-User mailing list
> IPython-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-user




More information about the IPython-User mailing list