[SciPy-dev] scipy.fft module slow for complex inputs when linked to fftw3
David Cournapeau
david at ar.media.kyoto-u.ac.jp
Fri Aug 18 05:01:17 CDT 2006
David M. Cooke wrote:
> On Fri, Aug 18, 2006 at 05:32:14PM +0900, David Cournapeau wrote:
>
>> Hi there,
>>
>> I noticed recently that when using the fft module of scipy, it is
>> much slower (5-10 folds) than numpy for complex inputs (only in the 1d
>> case) when linking to fftw3. This problem is reported on the ticket #1
>> for scipy : http://projects.scipy.org/scipy/scipy/ticket/1
>>
>> I am not sure, because the code is a bit difficult to read, but it looks
>> like in the case of complex input + fftw3, the plan is always recomputed
>> for each call to zfft (file:zfft.c), whereas in the real case or in the
>> complexe case + fftw2, the function drfft(file:drfft.c), called from
>> zrfft (file:zrfft.c) is calling a plan which is cached. I am trying to
>> see how the caching is done, but I am not sure I will have the time to
>> make it work for fftw3.
>>
>
> Well, for fftw3 it uses FFTW_ESTIMATE for the plan. So it does a cheap
> estimate of what it needs.
Well, it depends what you mean by cheap. If compared to FFTW_MEASURE,
yes. But compared to pre-computing the plan, and then doing multiple
fftw, then it is not cheap. Computing the fftw is negligeable compared
to computing the plan !
I paste a c file which shows the difference, and which emulates (the
function test_noncached) the scipy.fft module (emulates as I understand
it): it gives the computation for a a complex array of 1024 elements,
iterated 1000 times. First, the number of cycles for all iteration, then
per fft on average, and min of all iteration.
The only difference between cached and non cached is that the plan is
computed again for each iteration in the non cached case (as done by
scipy.fft now in the case of begin linked with fftw3):
output:
testing cached
cycle is 69973016, 69973 per execution, min is 66416
testing non cached
cycle is 946186540, 946186 per execution, min is 918036
Code: (cycle.h is available here: http://www.fftw.org/cycle.h, and is
the code used for the estimation of best code in plans by fftw3)
#include <stddef.h>
#include <stdlib.h>
#include <fftw3.h>
#include "cycle.h"
#define MALLOC(size) fftw_malloc((size))
#define FREE(ptr) fftw_free((ptr))
typedef struct {
double total;
double min;
} result;
result test_cached(size_t size, size_t niter);
result test_noncached(size_t size, size_t niter);
int main(void)
{
size_t niter = 1e3;
size_t size = 1024;
result res;
printf("testing cached\n");
res = test_cached(size, niter);
printf("\t cycle is %d, %d per execution, min is %d\n",
(size_t)res.total, (size_t)res.total/niter, (size_t)res.min);
printf("testing non cached\n");
res = test_noncached(size, niter);
printf("\t cycle is %d, %d per execution, min is %d\n",
(size_t)res.total, (size_t)res.total/niter, (size_t)res.min);
fftw_cleanup();
return 0;
}
result test_cached(size_t size, size_t niter)
{
fftw_complex *in, *out;
fftw_plan plan;
ticks t0, t1;
double res, min, acc;
size_t i, j;
result r;
in = MALLOC(sizeof(*in)*size);
out = MALLOC(sizeof(*out)*size);
plan = fftw_plan_dft_1d(size, in, out, 1, FFTW_ESTIMATE);
acc = 0;
min = 1e100;
for(i = 0; i < niter; ++i) {
for(j = 0; j < size; ++j) {
in[j][0] = (0.5 - (double)rand() / RAND_MAX);
in[j][1] = 0.1*j;
}
t0 = getticks();
fftw_execute(plan);
t1 = getticks();
res = elapsed(t1, t0);
if (res < min) {
min = res;
}
acc += res;
}
r.total = acc;
r.min = min;
fftw_destroy_plan(plan);
FREE(in);
FREE(out);
return r;
}
result test_noncached(size_t size, size_t niter)
{
fftw_complex *in, *out;
fftw_plan plan;
ticks t0, t1;
double res, min, acc;
size_t i, j;
result r;
in = MALLOC(sizeof(*in)*size);
out = MALLOC(sizeof(*out)*size);
acc = 0;
min = 1e100;
for(i = 0; i < niter; ++i) {
for(j = 0; j < size; ++j) {
in[j][0] = (0.5 - (double)rand() / RAND_MAX);
in[j][1] = 0.1*j;
}
t0 = getticks();
plan = fftw_plan_dft_1d(size, in, out, 1, FFTW_ESTIMATE);
fftw_execute(plan);
fftw_destroy_plan(plan);
t1 = getticks();
res = elapsed(t1, t0);
if (res < min) {
min = res;
}
acc += res;
}
r.total = acc;
r.min = min;
FREE(in);
FREE(out);
return r;
}
Compile by:
gcc -c test.c -o test.o -W -Wall
gcc -lfftw3 -lm -o test test.o
More information about the Scipy-dev
mailing list