[Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication
Thu Jun 16 09:16:16 CDT 2011
On 06/15/2011 07:25 PM, Sturla Molden wrote:
> Den 15.06.2011 23:22, skrev Christopher Barker:
>> I think the issue got confused -- the OP was not looking to speed up a
>> matrix multiply, but rather to speed up a whole bunch of independent
>> matrix multiplies.
> I would do it like this:
> 1. Write a Fortran function that make multiple calls DGEMM in a do loop.
> (Or Fortran intrinsics dot_product or matmul.)
> 2. Put an OpenMP pragma around the loop (!$omp parallel do). Invoke the
> OpenMP compiler on compilation. Use static or guided thread scheduling.
> 3. Call Fortran from Python using f2py, ctypes or Cython.
> Build with a thread-safe and single-threaded BLAS library.
> That should run as fast as it gets.
> NumPy-Discussion mailing list
The idea was to calculate:
innerProduct =numpy.sum(array1*array2) where array1 and array2 are, in
Numpy has an inner product function called np.inner
(http://www.scipy.org/Numpy_Example_List_With_Doc#inner) which is a
special case of tensordot. See the documentation for what is does and
other examples. Also note that np.inner provides of the cross-products
as well. For example, you can just do:
>>> import numpy as np
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
array([[ 0, 10, 20, 30, 40],
[50, 60, 70, 80, 90]])
array([[ 0, 10, 40, 90, 160],
[250, 360, 490, 640, 810]])
array([[ 300, 800],
[ 800, 2550]])
I do not know if numpy's multiplication, np.inner() and np.sum() are
single-threaded and thread-safe when you link to multi-threaded blas,
lapack or altas libraries. But if everything is *single-threaded* and
thread-safe, then you just create a function and use Anne's very useful
handythread.py (http://www.scipy.org/Cookbook/Multithreading). Otherwise
you would have to determine the best approach such doing nothing
(especially if the arrays are huge), or making the functions single-thread.
By the way, if the arrays are sufficiently small, there is a lot of
overhead involved such that there is more time in communication than
computation. So you have to determine the best algorithm for you case.
More information about the NumPy-Discussion