[IPython-User] Parallel question: Sending data directly between engines
Sun Jan 8 13:02:27 CST 2012
On Sun, Jan 8, 2012 at 4:26 AM, Olivier Grisel <firstname.lastname@example.org> wrote:
> AFAIK the traditional way to implement the AllReduce is the to first a
> spanning tree over the nodes / engines. For instance if you have 10
> nodes, define a fixed arbitrary binary tree that spans all the nodes
> involved in the computation:
Thanks for the explanation. Indeed, we don't have communication
patterns with other topologies implemented yet out of the box, so it's
good to have this use case described well for us. It seems like we
should be able to use the star-topology for now, which works out of
the box (but has the limitation you point out), and implementing other
communication patterns such as a spanning tree one should be easy.
Do you know how they handle node failure? What happens when a node
disappears to its children?
More information about the IPython-User