[SciPy-User] Distributed computing: running embarrassingly parallel (python/c++) codes over a cluster

Gael Varoquaux gael.varoquaux@normalesup....
Mon Nov 9 12:35:15 CST 2009


On Mon, Nov 09, 2009 at 11:58:55PM +0530, Rohit Garg wrote:
> On Mon, Nov 9, 2009 at 11:47 PM, Gael Varoquaux
> <gael.varoquaux@normalesup.org> wrote:
> > On Mon, Nov 09, 2009 at 11:41:29PM +0530, Rohit Garg wrote:
> >> Hi all,

> >> I have an embarrassingly parallel problem, very nicely suited to
> >> parallelization.

> > A non-optimal solution that I like:
> > http://gael-varoquaux.info/blog/?p=119

> Thanks, for the pointer, but after a quick read, it doesn't look like
> it supports distributed memory parallelism. Or does it?

If by distributed memory you mean shared memory, you won't get this, but
the copy on write of Unix gives you part of it, but not all of it. One
hack is to use memmapping to a file to share memory between processes (it
won't cost you IO, because your OS will be smart-enough to cache
everything). The right way to do it is to use a shared memory array,
which Sturla and I started working on ages ago, but never found time to
integrate to numpy. 

If you mean parallelism on architectures where 'fork' won't distributes
the processes (like a cluster), than multiprocessing won't do the trick,
and you will need to look at IPython or parallel Python.

Gaël


More information about the SciPy-User mailing list