# [Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication

Bruce Southey bsouthey@gmail....
Thu Jun 16 09:16:16 CDT 2011

```On 06/15/2011 07:25 PM, Sturla Molden wrote:
> Den 15.06.2011 23:22, skrev Christopher Barker:
>> I think the issue got confused -- the OP was not looking to speed up a
>> matrix multiply, but rather to speed up a whole bunch of independent
>> matrix multiplies.
> I would do it like this:
>
> 1. Write a Fortran function that make multiple calls DGEMM in a do loop.
> (Or Fortran intrinsics dot_product or matmul.)
>
> 2. Put an OpenMP pragma around the loop  (!\$omp parallel do). Invoke the
> OpenMP compiler on compilation. Use static or guided thread scheduling.
>
> 3. Call Fortran from Python using f2py, ctypes or Cython.
>
>
> That should run as fast as it gets.
>
> Sturla
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

The idea was to calculate:
innerProduct =numpy.sum(array1*array2) where array1 and array2 are, in
general, multidimensional.

Numpy has an inner product  function called np.inner
(http://www.scipy.org/Numpy_Example_List_With_Doc#inner) which is a
special case of tensordot. See the documentation for what is does and
other examples. Also note that np.inner provides of the cross-products
as well. For example,  you can just do:

>>> import numpy as np
>>> a=np.arange(10).reshape(2,5)
>>> a
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
>>> b=a*10
>>> b
array([[ 0, 10, 20, 30, 40],
[50, 60, 70, 80, 90]])
>>> c=a*b
>>> c
array([[  0,  10,  40,  90, 160],
[250, 360, 490, 640, 810]])
>>> c.sum()
2850
>>>d=np.inner(a,b)
>>>d
array([[ 300, 800],
[ 800, 2550]])
>>>np.diag(d).sum()
2850

I do not know if numpy's multiplication, np.inner() and np.sum() are
lapack or altas libraries. But if everything is *single-threaded* and
thread-safe, then you just create a function and use Anne's very useful