[Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication
Wed Jun 15 19:25:13 CDT 2011
Den 15.06.2011 23:22, skrev Christopher Barker:
> I think the issue got confused -- the OP was not looking to speed up a
> matrix multiply, but rather to speed up a whole bunch of independent
> matrix multiplies.
I would do it like this:
1. Write a Fortran function that make multiple calls DGEMM in a do loop.
(Or Fortran intrinsics dot_product or matmul.)
2. Put an OpenMP pragma around the loop (!$omp parallel do). Invoke the
OpenMP compiler on compilation. Use static or guided thread scheduling.
3. Call Fortran from Python using f2py, ctypes or Cython.
Build with a thread-safe and single-threaded BLAS library.
That should run as fast as it gets.
More information about the NumPy-Discussion