[Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication
Thu Jun 16 16:31:16 CDT 2011
On 06/16/2011 02:05 PM, Brandt Belson wrote:
> Hi all,
> Thanks for the replies. As mentioned, I'm parallelizing so that I can
> take many inner products simultaneously (which I agree is
> embarrassingly parallel). The library I'm writing asks the user to
> supply a function that takes two objects and returns their inner
> product. After all the discussion though it seems this is too
> simplistic of an approach. Instead, I plan to write this part of the
> library as if the inner product function supplied by the user uses all
> available cores (with numpy and/or numexpr built with MKL or LAPACK).
> As far as using fortran or C and openMP, this probably isn't worth the
> time it would take, both for me and the user.
> I've tried increasing the array sizes and found the same trends, so
> the slowdown isn't only because the arrays are too small to see the
> benefit of multiprocessing. I wrote the code to be easy for anyone to
> experiment with, so feel free to play around with what is included in
> the profiling, the sizes of arrays, functions used, etc.
> I also tried using handythread.foreach with arraySize = (3000,1000),
> and found the following:
> No shared memory, numpy array multiplication took 1.57585811615 seconds
> Shared memory, numpy array multiplication took 1.25499510765 seconds
> This is definitely an improvement from multiprocessing, but without
> knowing any better, I was hoping to see a roughly 8x speedup on my
> 8-core workstation.
> Based on what Chris sent, it seems there is some large overhead caused
> by multiprocessing pickling numpy arrays. To test what Robin mentioned
> > If you are on Linux or Mac then fork works nicely so you have read
> > only shared memory you just have to put it in a module before the fork
> > (so before pool = Pool() ) and then all the subprocesses can access it
> > without any pickling required. ie
> > myutil.data = listofdata
> > p = multiprocessing.Pool(8)
> > def mymapfunc(i):
> > return mydatafunc(myutil.data[i])
> > p.map(mymapfunc, range(len(myutil.data)))
> I tried creating the arrayList in the myutil module and using
> multiprocessing to find the inner products of myutil.arrayList,
> however this was still slower than not using multiprocessing, so I
> believe there is still some large overhead. Here are the results:
> No shared memory, numpy array multiplication took 1.55906510353 seconds
> Shared memory, numpy array multiplication took 9.82426381111 seconds
> Shared memory, myutil.arrayList numpy array multiplication took
> 8.77094507217 seconds
> I'm attaching this code.
> I'm going to work around this numpy/multiprocessing behavior with
> numpy/numexpr built with MKL or LAPACK. It would be good to know
> exactly what's causing this though. It would be nice if there was a
> way to get the ideal speedup via multiprocessing, regardless of the
> internal workings of the single-threaded inner product function, as
> this was the behavior I expected. I imagine other people might come
> across similar situations, but again I'm going to try to get around
> this by letting MKL or LAPACK make use of all available cores.
> Thanks again,
> NumPy-Discussion mailing list
I think this is not being benchmarked correctly because there should be
a noticeable different when different number of threads are selected.
But really you should read these sources:
Also numpy has extra things going on like checks and copies that
probably make using np.inner() slower. Thus, your 'numpy_inner_product'
is probably as efficient as you can get without extreme measures like
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion