[Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication
Bruce Southey
bsouthey@gmail....
Thu Jun 16 09:16:16 CDT 2011
On 06/15/2011 07:25 PM, Sturla Molden wrote:
> Den 15.06.2011 23:22, skrev Christopher Barker:
>> I think the issue got confused -- the OP was not looking to speed up a
>> matrix multiply, but rather to speed up a whole bunch of independent
>> matrix multiplies.
> I would do it like this:
>
> 1. Write a Fortran function that make multiple calls DGEMM in a do loop.
> (Or Fortran intrinsics dot_product or matmul.)
>
> 2. Put an OpenMP pragma around the loop (!$omp parallel do). Invoke the
> OpenMP compiler on compilation. Use static or guided thread scheduling.
>
> 3. Call Fortran from Python using f2py, ctypes or Cython.
>
> Build with a thread-safe and single-threaded BLAS library.
>
> That should run as fast as it gets.
>
> Sturla
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
The idea was to calculate:
innerProduct =numpy.sum(array1*array2) where array1 and array2 are, in
general, multidimensional.
Numpy has an inner product function called np.inner
(http://www.scipy.org/Numpy_Example_List_With_Doc#inner) which is a
special case of tensordot. See the documentation for what is does and
other examples. Also note that np.inner provides of the cross-products
as well. For example, you can just do:
>>> import numpy as np
>>> a=np.arange(10).reshape(2,5)
>>> a
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
>>> b=a*10
>>> b
array([[ 0, 10, 20, 30, 40],
[50, 60, 70, 80, 90]])
>>> c=a*b
>>> c
array([[ 0, 10, 40, 90, 160],
[250, 360, 490, 640, 810]])
>>> c.sum()
2850
>>>d=np.inner(a,b)
>>>d
array([[ 300, 800],
[ 800, 2550]])
>>>np.diag(d).sum()
2850
I do not know if numpy's multiplication, np.inner() and np.sum() are
single-threaded and thread-safe when you link to multi-threaded blas,
lapack or altas libraries. But if everything is *single-threaded* and
thread-safe, then you just create a function and use Anne's very useful
handythread.py (http://www.scipy.org/Cookbook/Multithreading). Otherwise
you would have to determine the best approach such doing nothing
(especially if the arrays are huge), or making the functions single-thread.
By the way, if the arrays are sufficiently small, there is a lot of
overhead involved such that there is more time in communication than
computation. So you have to determine the best algorithm for you case.
Bruce
More information about the NumPy-Discussion
mailing list