[SciPy-User] Distributed computing: running embarrassingly parallel (python/c++) codes over a cluster
Luis Pedro Coelho
lpc@cmu....
Thu Nov 12 08:37:36 CST 2009
Rohit Garg wrote:
> I have an embarrassingly parallel problem, very nicely suited to
> parallelization.
I have lots of those :)
> My only constraint is that it should be able to run a python extension
> (c++) with minimum of fuss. I want to minimize the headaches involved
> with setting up/writing the boilerplate code. Which
> framework/approach/library would you recommend?
My own: It's called jug. See
http://luispedro.org/software/jug
(
Or download the code from github:
http://github.com/luispedro/jug
)
*
It works with any set of processors that can either share a filesystem (plays
well with NFS, but can be slow) or a connection to a redis database (which is
very easy to set up and is probably as fast as any other approach if everyone
is on the same processor).
A major advantage is that you write mostly Python (and not something funny
looking). For example, here's what a programme with that framework would look
like:
@TaskGenerator
def preprocess(input):
...
@TaskGenerator
def compute(input, param):
...
@TaskGenerator
def collect(inputs):
...
results = []
for input in glob('*.in'):
intermediate = preprocess(input)
results.append(compute(intermediate, param))
final = collect(results)
The only step that's different w.r.t. to the linear version is adding the
TaskGenerator decorator, which changes a call of preprocess(input) into
Task(preprocess, input).
Jug handles everything else.
I have been using this now for almost year for all my research work and it
works very well for me.
HTH,
Luis
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part.
Url : http://mail.scipy.org/pipermail/scipy-user/attachments/20091112/f8730150/attachment.bin
More information about the SciPy-User
mailing list