[IPython-User] Parallel question: Sending data directly between engines

MinRK benjaminrk@gmail....
Sun Jan 8 00:59:56 CST 2012


On Sat, Jan 7, 2012 at 22:11, Fernando Perez <fperez.net@gmail.com> wrote:

> On Sat, Jan 7, 2012 at 9:54 PM, MinRK <benjaminrk@gmail.com> wrote:
> > The problem is that gathering everything to the client prior to calling
> > reduce won't work for the cases where the whole parallel problem doesn't
> fit
> > in memory on one node, so you have to do a *parallel* reduce, which can
> be
> > implemented without requiring more than two parts of the problem on a
> node
> > at any given time.
>
> Ah, but I'd understood that though the whole problem would never fit
> in one node at the same time, this was about reducing the parameter
> vector, which does fit in all nodes (in fact, Olivier says that it
> gets broadcast at init time).  It's just that over time, it diverges
> as each node computes parameter updates for the data it has, so the
> whole vector needs to be recomputed as a weighted average of all the
> vectors in all individual nodes.  Now, if the number of nodes is so
> large that having n_nod copies of the parameter vector exceeds the
> client's memory, then the weighted average can be done in batches, by
> requesting only data from one group of nodes at a time.
>

Ah, maybe I was the one who misunderstood.  I should note that you can
iterate through AsyncResults immediately, without waiting for the last
component, and can pass AsyncResults to anything that expects an iterable,
such as builtin reduce, map, or sum.

view.scatter('x', rc.ids)
ar = view.gather('x', block=False) # an AsyncResult object
x_sum = reduce(lambda x,y: x+y, ar) # works with builtin reduce function
view['x_sum'] = x_sum # push

You can even do full map/reduce in one line with view.map_async and the
builtin reduce function.  The map will be evaluated remotely and in
parallel, but the reduce will be evaluated locally, if asynchronously:

def reduce_func(a,b):
    """simplest reduce func: sum"""
    return a+b

def map_func(n):
    """take about a millisecond to return a random integer"""
    import random
    time.sleep(.001*random.random())
    return random.randint(n)

reduce(reduce_func, view.map_async(map_func, range(10000)))

is equivalent to:

reduce(reduce_func, map(map_func, range(10000)))

And for load-balanced views, this can even be done out of order in case
early results may take more time to arrive than later ones:

reduce(reduce_func, lbview.map_async(map_func, range(10000,1,-1),
ordered=False))

-MinRK


> At least that's how I read the original description...
>
> Cheers,
>
> f
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/ipython-user/attachments/20120107/92eddc02/attachment.html 


More information about the IPython-User mailing list