[SciPy-User] Distributed computing: running embarrassingly parallel (python/c++) codes over a cluster
Luis Pedro Coelho
Thu Nov 12 08:37:36 CST 2009
Rohit Garg wrote:
> I have an embarrassingly parallel problem, very nicely suited to
I have lots of those :)
> My only constraint is that it should be able to run a python extension
> (c++) with minimum of fuss. I want to minimize the headaches involved
> with setting up/writing the boilerplate code. Which
> framework/approach/library would you recommend?
My own: It's called jug. See
Or download the code from github:
It works with any set of processors that can either share a filesystem (plays
well with NFS, but can be slow) or a connection to a redis database (which is
very easy to set up and is probably as fast as any other approach if everyone
is on the same processor).
A major advantage is that you write mostly Python (and not something funny
looking). For example, here's what a programme with that framework would look
def compute(input, param):
results = 
for input in glob('*.in'):
intermediate = preprocess(input)
final = collect(results)
The only step that's different w.r.t. to the linear version is adding the
TaskGenerator decorator, which changes a call of preprocess(input) into
Jug handles everything else.
I have been using this now for almost year for all my research work and it
works very well for me.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 197 bytes
Desc: This is a digitally signed message part.
Url : http://mail.scipy.org/pipermail/scipy-user/attachments/20091112/f8730150/attachment.bin
More information about the SciPy-User