[IPython-User] parallel IPython analysis with large dataset

Michael Kuhlen kuhlen@gmail....
Mon Jul 9 22:42:04 CDT 2012


OK, thanks for the update.

I've figured out how to do it with python.multiprocessing, but it
would have been nice to use IPython to do it.

Darren, the problem is indeed embarrassingly parallel, but on a large
dataset that fits into memory only once, not multiple times. I was
hoping that IPython could operate on the data in a shared memory way,
like Fernando was describing.

Cheers,

Mike

On Mon, Jul 9, 2012 at 8:37 PM, Fernando Perez <fperez.net@gmail.com> wrote:
> On Mon, Jul 9, 2012 at 7:33 PM, Michael Kuhlen <kuhlen@gmail.com> wrote:
>> Specifically, is it now possible to analyze a large dataset using
>> IPython parallel tools *without* replicating it in memory Ncore times?
>> If yes, great! How would I do it?
>
> Because the model in IPython does not use fork(), then the same answer
> as in 2009 applies.  It's the fact that multiprocessing uses fork(),
> which on *nix shares the memory of the parent process with
> copy-on-write semantics, that allows for that to happen transparently.
>
> In IPython, assuming you are restricted to a multicore/shared mem
> situation, you'd need to manually set up your large array(s) to be in
> a shared memory area explicitly.
>
> I have seen over time notes about numpy and shared memory, but I'm
> afraid I have no direct experience with it.
>
> Cheers,
>
> f
> _______________________________________________
> IPython-User mailing list
> IPython-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-user


More information about the IPython-User mailing list