[SciPy-User] Distributed computing: running embarrassingly parallel (python/c++) codes over a cluster

Rohit Garg rpg.314@gmail....
Mon Nov 9 12:11:29 CST 2009

Hi all,

I have an embarrassingly parallel problem, very nicely suited to
parallelization. I am looking for community feedback on how to best
approach this matter? Basically, I just setup a bunch of tasks, and
the various cpu's will pull data, process it, and send it back. Out of
order arrival of results is no problem. The processing times involved
are so large that the communication is effectively free, and hence I
don't care how fast/slow the communication is. I thought I'll ask in
case somebody has done this stuff before to avoid reinventing the
wheel. Any other suggestions are welcome too.

My only constraint is that it should be able to run a python extension
(c++) with minimum of fuss. I want to minimize the headaches involved
with setting up/writing the boilerplate code. Which
framework/approach/library would you recommend?

There is one method mentioned at [1], and of course, one could resort
to something like mpi4py.

[1] http://docs.python.org/library/multiprocessing.html   {see the last example}

Rohit Garg


Senior Undergraduate
Department of Physics
Indian Institute of Technology

More information about the SciPy-User mailing list