[IPython-User] map/reduce examples?

Brian Granger ellisonbg@gmail....
Thu Apr 26 00:59:36 CDT 2012


On Wed, Apr 25, 2012 at 10:45 PM, Fernando Perez <fperez.net@gmail.com> wrote:
> Hi Darren,
>
> On Wed, Apr 25, 2012 at 7:41 AM, Darren Govoni <darren@ontrenet.com> wrote:
>> Hi,
>>  Is there an example out there of doing map/reduce with ipython?
>> I know its probably not that complicated to develop, but wanted to see
>> what others had done first.
>
> No, a full-blown example with all the mapreduce semantics hasn't been
> implemented, but the key binary tree reduction step was recently
> written up by Min, it's in a PR that's almost ready to go in:
>
> https://github.com/ipython/ipython/pull/1295
>
> If you want to work on providing the remainder of the MapReduce api
> along with the canonical wordcount implementation (that's the "hello
> world" of the MapReduce universe), it would be great!  Let us know and
> we can try to finish up that PR so you have that layer ready to work
> from.

One important point about the Google style map/reduce approach and the
parallelization of the reduce step.  The binary tree reduction
algorithm used in the above PR and my MPI is useful for parallelizing
the reduction of a single key/value pair.  In those cases you often
haven't even explicitly identified a "key" for the set of values.
Things like summing up the values in an array fits this model of
reduction.

In the typically applications of map/reduce though, you want to
perform the reduction on many keys.  The binary tree reduction then
looses out because you have to do the full binary tree reduction for
each key.  The situation is even worse if the data set is sparse - it
is possible that not each node has data for a given key.  Then you end
up walking a binary tree where many of the nodes are empty.

In the large key limit, there is a simple and efficient method of
reduction that will be even easier to implement.  Each key is hashed
to an integer mod the number of nodes - that is the node that will
perform the final reduction for that key.  After performing a local
reduction step, each node sends its reduction data for each key to
that keys reduction node, which then performs the final reduction
step.  This should not be difficult to implement in IPython.

Cheers,

Brian

> Cheers,
>
> f
> _______________________________________________
> IPython-User mailing list
> IPython-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-user



-- 
Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com


More information about the IPython-User mailing list