[SciPy-dev] FFTW performances in scipy and numpy

David Cournapeau david@ar.media.kyoto-u.ac...
Thu Aug 2 23:38:26 CDT 2007

Steven G. Johnson wrote:
> On Thu, 2 Aug 2007, David Cournapeau wrote:
>> This is the way matlab works, right ? If I understand correctly, 
>> wisdoms are a way to compute plans "offline". So for example, if you 
>> compute plans with FFTW_MEASURE | FFTW_UNALIGNED, for inplace 
>> transforms and a set of sizes, you can record it in a wisdom, and 
>> reload it such as later calls with FFTW_MEASURE | FFTW_UNALIGNED will 
>> be fast ?
> Yes, although a wisdom file can contain as many saved plans as you want.
>> Anyway, all this sounds like it should be solved by adding a better 
>> infrastructure the current wrappers (ala matlab).
> I know a little about how the Matlab usage of FFTW works, and they are 
> definitely not getting the full performance you would get by calling 
> FFTW yourself from C etc.  So they are not necessarily the gold standard.
I certainly do not consider matlab as the gold standard. That's more a 
reason why the current situation to have worse performances than matlab 
for fft in scipy with fftw3 is not good. But I will work on a better 
cache mechanism with a user interface in python (the complex 
implementation of fft with fftw3 does not use copy anymore for a few 
days now, by the way).
> If you malloc and then immediately free, most of the time the malloc 
> implementation is just going to re-use the same memory and so you will 
> get the same pointer over and over.  So it's not a good test of malloc 
> alignment.
Ah, this was stupid indeed. I should have checked the addresses returned 
by malloc. But then, playing a bit with the test program, I found that 
using size above ~ 17000 double starts to make the ratio of aligned data 
decreasing. This decreasing does not happen if I force malloc to use 
srbk and not mmap for big sizes: this is consistent with the fact the 
threshold for 32 bits for mmapping areas is 128 kb in gnu libc.

Basically, areas which are allocated through mmap seem to be never 16 
bytes aligned ! This is starting to go way beyond my knowledge... I 
thought mmap were page aligned, which means 16 bytes alignment. Maybe 
malloc does not return the pointer it got from mmap directly, but a 
shifted version for some reasons ?

Maybe my test is flawed again ? I pasted it just below. For example, if 
you have N = 65384, the ratio is more about 10 % than 50 %; if you force 
not using mmap (M_MMAP_MAX = 0), then it goes back to ~50 %.



#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <time.h>
#include <malloc.h>

#define NARRAY 10000
#define N (1 << 16)

int main(void)
     void *a[NARRAY];
     uintptr_t p;
     int i, nalign = 0;
     int st;

     /* default value, at least on my 32 bits ubuntu with glibc: 128 kb */
     st = mallopt(M_MMAP_THRESHOLD, 128 * 1024);
     if (st == 0) {
         fprintf(stderr, "changing malloc option failed\n");
     st = mallopt(M_MMAP_MAX, NARRAY);
     if (st == 0) {
         fprintf(stderr, "changing malloc option failed\n");


     for (i = 0; i < NARRAY; ++i) {
          a[i] = malloc((rand() % N + 2) * sizeof(double));
          p = (uintptr_t) a[i];
          if (p % 16 == 0) ++nalign;

     printf("%d/%d = %g%% are 16-byte aligned\n", nalign, NARRAY,
            nalign * 100.0/NARRAY);

     for (i = 0; i < NARRAY; ++i)

     return 0;

More information about the Scipy-dev mailing list