[SciPy-User] Parallel operations on the columns of a numpy array

Sturla Molden sturla@molden...
Fri Aug 13 15:33:09 CDT 2010

> So I'm just looking for any hints... Would FFTW help? Would ATLAS
> offer better performances in that regard than Intel's MKL? Is there a
> better way to parallelize column operations?

The FFTs in FFTW, MKL and ACML uses multithreading out of the box. NumPy's
fftpack_lite does not. NumPy's FFTs can be multithreaded e.g. with
threading.Thread if the GIL was released. I once opened a ticket for this,
and supplied the required code, but it's not in NumPy trunk, nor in the
"unified diff". Basically we must put Py_BEGIN_ALLOW_THREADS and
Py_END_ALLOW_THREADS pragmas inside fftpack_litemodule.c. The library is
thread safe. It is easy to recompile fftpack_lite to use multithreading
even automatically (OpenMP).

Also note that FFTW, MKL or ACML will give you faster FFTs than FFTPACK,
multithreaded or not. So you will in any case be better off with one of
these. FFTW is GPL. MKL and ACML are not free (as in speech), but ACML can
be used for free (as in beer). ATLAS will not help as there are no FFT in
ATLAS. Nor will compiling NumPy against MKL or ACML help, as the FFTs are
not used by NumPy.


Note that when you use FFT libraries like FFTW, arrays must be promoted to
16 byte boundaries (that's what fftw_malloc does). We can do that from
NumPy, e.g. see here:


P.S. SciPy's fftpack is not thread safe due to the way info arrays are
cached, and cannot be freely threaded in its current state. But free
threading is OK for NumPy's fftpack_lite.


More information about the SciPy-User mailing list