[Numpy-discussion] multiprocessing (shared memory) with numpy array multiplication
Brandt Belson
bbelson@princeton....
Mon Jun 13 22:18:27 CDT 2011
Hi all,
Thanks for your replies.
> Brandt Belson wrote:
> > Unfortunately I can't flatten the arrays. I'm writing a library where
> > the user supplies an inner product function for two generic objects, and
> > almost always the inner product function does large array
> > multiplications at some point. The library doesn't get to know about the
> > underlying arrays.
>
> Now I'm confused -- if the user is providing the inner product
> implementation, how can you optimize that? Or are you trying to provide
> said user with an optimized "large array multiplication" that he/she can
> use?
I'm sorry if I wasn't clear. I'm not providing a new array
multiplication function. I'm taking the inner product function (which
usually contains numpy array multiplication) from the user as a given.
I am parallelizing the process of performing *many* inner products so
that each core can do them independently. The parallelization is in
performing many individual inner products, not within each inner
product/array multiplication.
> If so, then I'd post your implementation here, and folks can suggest
> improvements.
I did attach some code showing what I'm doing but that was a few days
ago so I'll attach it again.
> If it's regular old element-wise multiplication:
>
> a*b
>
> (where a and b are numpy arrays)
>
> then you are right, numpy isn't using any fancy multi-core aware
> optimized package, so you should be able to make a faster version.
>
> You might try numexpr also -- it's pretty cool, though may not help for
> a single operation. It might give you some ideas, though.
>
> http://www.scipy.org/SciPyPackages/NumExpr
>
>
> -Chris
NumExpr looks helpful and I'll definitely look into it, but the main
issue is parallelizing many element-wise array multiplications, not
speeding-up the array multiplication operation. It might be that
parallelizing the individual inner products among cores isn't the
right approach, but I'm not sure it's wrong yet.
> > Message: 2
> > Date: Fri, 10 Jun 2011 09:23:10 -0400
> > From: Olivier Delalleau <shish@keba.be <mailto:shish@keba.be>>
> > Subject: Re: [Numpy-discussion] Using multiprocessing (shared memory)
> > with numpy array multiplication
> > To: Discussion of Numerical Python <numpy-discussion@scipy.org
> > <mailto:numpy-discussion@scipy.org>>
> > Message-ID: <BANLkTikjppC90yE56T1mr+byAxXAw32YJA@mail.gmail.com
> > <mailto:BANLkTikjppC90yE56T1mr%2BbyAxXAw32YJA@mail.gmail.com>>
> > Content-Type: text/plain; charset="iso-8859-1"
> >
> > It may not work for you depending on your specific problem
> > constraints, but
> > if you could flatten the arrays, then it would be a dot, and you
> > could maybe
> > compute multiple such dot products by storing those flattened arrays
> > into a
> > matrix.
> >
> > -=- Olivier
> >
> > 2011/6/10 Brandt Belson <bbelson@princeton.edu
> > <mailto:bbelson@princeton.edu>>
> >
> > > Hi,
> > > Thanks for getting back to me.
> > > I'm doing element wise multiplication, basically innerProduct =
> > > numpy.sum(array1*array2) where array1 and array2 are, in general,
> > > multidimensional. I need to do many of these operations, and I'd
> > like to
> > > split up the tasks between the different cores. I'm not using
> > numpy.dot, if
> > > I'm not mistaken I don't think that would do what I need.
> > > Thanks again,
> > > Brandt
> > >
> > >
> > > Message: 1
> > >> Date: Thu, 09 Jun 2011 13:11:40 -0700
> > >> From: Christopher Barker <Chris.Barker@noaa.gov
> > <mailto:Chris.Barker@noaa.gov>>
> > >> Subject: Re: [Numpy-discussion] Using multiprocessing (shared
> > memory)
> > >> with numpy array multiplication
> > >> To: Discussion of Numerical Python <numpy-discussion@scipy.org
> > <mailto:numpy-discussion@scipy.org>>
> > >> Message-ID: <4DF128FC.8000807@noaa.gov
> > <mailto:4DF128FC.8000807@noaa.gov>>
> > >> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> > >>
> > >> Not much time, here, but since you got no replies earlier:
> > >>
> > >>
> > >> > > I'm parallelizing some code I've written using the built in
> > >> > multiprocessing
> > >> > > module. In my application, I need to multiply many
> > large arrays
> > >> > together
> > >>
> > >> is the matrix multiplication, or element-wise? If matrix, then numpy
> > >> should be using LAPACK, which, depending on how its built, could be
> > >> using all your cores already. This is heavily dependent on your your
> > >> numpy (really the LAPACK it uses0 is built.
> > >>
> > >> > > and
> > >> > > sum the resulting product arrays (inner products).
> > >>
> > >> are you using numpy.dot() for that? If so, then the above applies to
> > >> that as well.
> > >>
> > >> I know I could look at your code to answer these questions, but I
> > >> thought this might help.
> > >>
> > >> -Chris
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> --
> > >> Christopher Barker, Ph.D.
> > >> Oceanographer
> > >>
> > >> Emergency Response Division
> > >> NOAA/NOS/OR&R (206) 526-6959
> > <tel:%28206%29%20526-6959> voice
> > >> 7600 Sand Point Way NE (206) 526-6329
> > <tel:%28206%29%20526-6329> fax
> > >> Seattle, WA 98115 (206) 526-6317
> > <tel:%28206%29%20526-6317> main reception
> > >>
> > >> Chris.Barker@noaa.gov <mailto:Chris.Barker@noaa.gov>
> Message: 2
> Date: Mon, 13 Jun 2011 12:51:08 -0500
> From: srean <srean.list@gmail.com>
> Subject: Re: [Numpy-discussion] Using multiprocessing (shared memory)
> with numpy array multiplication
> To: Discussion of Numerical Python <numpy-discussion@scipy.org>
> Message-ID: <BANLkTimkSYsD142D5e99bb7xKRVwEHgnzg@mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Looking at the code the arrays that you are multiplying seem fairly
> small (300, 200) and you have 50 of them. So it might the case that
> there is not enough computational work to compensate for the cost of
> forking new processes and communicating the results. Have you tried
> larger arrays and more of them ?
I've tried varying the sizes and the trends are consistent - using
multiprocessing on numpy array multiplication is slower than not using
it. For reference, I'm on a mac with the following numpy
configuration:
>>> print numpy.show_config()
lapack_opt_info:
extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
extra_compile_args = ['-faltivec']
define_macros = [('NO_ATLAS_INFO', 3)]
blas_opt_info:
extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
extra_compile_args = ['-faltivec',
'-I/System/Library/Frameworks/vecLib.framework/Headers']
define_macros = [('NO_ATLAS_INFO', 3)]
None
> If you are on an intel machine and you have MKL libraries around I
> would strongly recommend that you use the matrix multiplication
> routine if possible. MKL will do the parallelization for you. Well,
> any good BLAS implementation would do the same, you dont really need
> MKL. ATLAS and ACML would work too, just that MKL has been setup for
> us and it works well.
>
> To give an idea, given the amount of tuning and optimization that
> these libraries have undergone a numpy.sum would be slower that an
> multiplication with a vector of all ones. So in the interest of speed
> the longer you stay in the BLAS context the better.
>
> --srean
That seems like a good option. While I'd like the user to have minimal
restrictions and dependencies to consider when writing the inner
product function, maybe I should put the burden on them to parallelize
the inner products, which could be simply done by configuring numpy
with MKL I guess (I haven't tried this yet).
I'm still a bit curious what is causing my script to be slower when
the multiple inner products are parallelized.
Thanks,
Brandt
-------------- next part --------------
A non-text attachment was scrubbed...
Name: myutil.py
Type: application/octet-stream
Size: 383 bytes
Desc: not available
Url : http://mail.scipy.org/pipermail/numpy-discussion/attachments/20110613/367e75e5/attachment.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: shared_mem.py
Type: application/octet-stream
Size: 1521 bytes
Desc: not available
Url : http://mail.scipy.org/pipermail/numpy-discussion/attachments/20110613/367e75e5/attachment-0001.obj
More information about the NumPy-Discussion
mailing list