[IPython-user] Multi-processor access to a large data set

Robert Ferrell ferrell@diablotech....
Thu Dec 17 15:07:35 CST 2009


On Dec 17, 2009, at 12:13 PM, Gael Varoquaux wrote:

> On Thu, Dec 17, 2009 at 11:19:00AM -0700, Robert Ferrell wrote:
>> The tasks all need read-only access to a large-ish data set (~ 1GB).
>> I don't want to replicate this data set 8 times.  How do I give each
>> engine read-only access.
>
> My approach is to use multiprocessing
> (http://docs.python.org/library/multiprocessing.html, new in 2.6 but
> exists as a separae module for 2.5). If you are under a unix box, the
> processes are spawned using fork. The memory pages are 'copy on write'
> after the fork, which means that if you don't write to the arrays that
> were created before the fork, they won't be copied.

I'm on OS X, with python 2.6, so I've got multiprocessing already.  If  
I understand, you are suggesting the python multiprocessing module,  
rather than the IPython parallel stuff.  Is that right?

thanks,

-robert



More information about the IPython-user mailing list