[Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication
Thu Jun 16 12:23:06 CDT 2011
On Thu, Jun 16, 2011 at 6:44 PM, Christopher Barker
>> 2. There is also the question of when the process pool is spawned. Though
>> I haven't checked, I suspect it happens prior to calling pool.map. But if it
>> does not, this is a factor as well, particularly on Windows (less so on
>> Linux and Apple).
> It didn't work well on my Mac, so ti's either not an issue, or not
> Windows-specific, anyway.
I am pretty sure that the process pool is spawned when you create the
pool object instance.
>> 3. "arrayList" is serialised by pickling, which has a significan
>> overhead. It's not shared memory either, as the OP's code implies, but the
>> main thing is the slowness of cPickle.
> I'll bet this is a big issue, and one I'm curious about how to address, I
> have another problem where I need to multi-process, and I'd love to know a
> way to pass data to the other process and back *without* going through
> pickle. maybe memmapped files?
If you are on Linux or Mac then fork works nicely so you have read
only shared memory you just have to put it in a module before the fork
(so before pool = Pool() ) and then all the subprocesses can access it
without any pickling required. ie
myutil.data = listofdata
p = multiprocessing.Pool(8)
Actually that won't work because mymapfunc needs to be in a module so
it can be pickled but hopefully you get the idea.
>> "IPs = N.array(innerProductList)"
>> 4. numpy.array is a very slow function. The benchmark should preferably
>> not include this overhead.
> I re-ran, moving that out of the timing loop, and, indeed, it helped a lot,
> but it still takes longer with the multi-processing.
> I suspect that the overhead of pickling, etc. is overwhelming the operation
> itself. That and the load balancing issue that I don't understand!
> To test this, I did a little experiment -- creating a "fake" operation, one
> that simply returns an element from the input array -- so it should take
> next to no time, and we can time the overhead of the pickling, etc:
> $ python shared_mem.py
> Using 2 processes
> No shared memory, numpy array multiplication took 0.124427080154 seconds
> Shared memory, numpy array multiplication took 0.586215019226 seconds
> No shared memory, fake array multiplication took 0.000391006469727 seconds
> Shared memory, fake array multiplication took 0.54935503006 seconds
> No shared memory, my array multiplication took 23.5055780411 seconds
> Shared memory, my array multiplication took 13.0932741165 seconds
> The overhead of the multi-processing takes about .54 seconds, which explains
> the slowdown for the numpy method
> not so mysterious after all.
> Bruce Southey wrote:
>> But if everything is *single-threaded* and thread-safe, then you just
>> create a function and use Anne's very useful handythread.py
> This may be worth a try -- though the GIL could well get in the way.
>> By the way, if the arrays are sufficiently small, there is a lot of
>> overhead involved such that there is more time in communication than
> yup -- clearly the case here. I wonder if it's just array size though --
> won't cPickle time scale with array size? So it may not be size pe-se, but
> rather how much computation you need for a given size array.
> [I've enclosed the OP's slightly altered code]
> Christopher Barker, Ph.D.
> Emergency Response Division
> NOAA/NOS/OR&R (206) 526-6959 voice
> 7600 Sand Point Way NE (206) 526-6329 fax
> Seattle, WA 98115 (206) 526-6317 main reception
> NumPy-Discussion mailing list
More information about the NumPy-Discussion