[SciPy-User] Distributed computing: running embarrassingly parallel (python/c++) codes over a cluster

Luis Pedro Coelho lpc@cmu....
Thu Nov 12 08:37:36 CST 2009

Rohit Garg wrote:
> I have an embarrassingly parallel problem, very nicely suited to
> parallelization. 

I have lots of those :)

> My only constraint is that it should be able to run a python extension
> (c++) with minimum of fuss. I want to minimize the headaches involved
> with setting up/writing the boilerplate code. Which
> framework/approach/library would you recommend?

My own: It's called jug. See


Or download the code from github:


It works with any set of processors that can either share a filesystem (plays 
well with NFS, but can be slow) or a connection to a redis database (which is 
very easy to set up and is probably as fast as any other approach if everyone 
is on the same processor).

A major advantage is that you write mostly Python (and not something funny 
looking). For example, here's what a programme with that framework would look 

def preprocess(input):

def compute(input, param):

def collect(inputs):

results = []
for input in glob('*.in'):
	intermediate = preprocess(input)
        results.append(compute(intermediate, param))        
final = collect(results)

The only step that's different w.r.t. to the linear version is adding the 
TaskGenerator decorator, which changes a call of preprocess(input) into 
Task(preprocess, input).

Jug handles everything else.

I have been using this now for almost year for all my research work and it 
works very well for me.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part.
Url : http://mail.scipy.org/pipermail/scipy-user/attachments/20091112/f8730150/attachment.bin 

More information about the SciPy-User mailing list