[Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication

Sturla Molden sturla@molden...
Wed Jun 15 19:25:13 CDT 2011

Den 15.06.2011 23:22, skrev Christopher Barker:
> I think the issue got confused -- the OP was not looking to speed up a
> matrix multiply, but rather to speed up a whole bunch of independent
> matrix multiplies.

I would do it like this:

1. Write a Fortran function that make multiple calls DGEMM in a do loop. 
(Or Fortran intrinsics dot_product or matmul.)

2. Put an OpenMP pragma around the loop  (!$omp parallel do). Invoke the 
OpenMP compiler on compilation. Use static or guided thread scheduling.

3. Call Fortran from Python using f2py, ctypes or Cython.

Build with a thread-safe and single-threaded BLAS library.

That should run as fast as it gets.


More information about the NumPy-Discussion mailing list