[SciPy-user] test_fft, test_ifft results
arnd.baecker at web.de
Mon Dec 12 02:28:07 CST 2005
Hi Darren and all fft-enthusiasts,
On Sun, 11 Dec 2005, Darren Dale wrote:
> On Saturday 10 December 2005 5:57 pm, Darren Dale wrote:
> > I have been wondering about the results of
> > fftpack.basic.test_basic.test_fft, fftpack.basic.test_basic.test_fftn and
> > fftpack.basic.test_basic.test_ifft. On my system, with scipy built against
> > fftw2 or 3, ffts of complex input takes over 8 times as long as real input.
> I would like to clarify this report. I thought that editing site.cfg to find
> fftw-2 would make scipy build against it, but this is not the case. One can
> build scipy with support for only fftw-2 by commenting out the fftw3
> dictionary in the fftw_info class in scipy/distutils/systeminfo.py. The
> performance of fft's for complex and real input are comparable if scipy is
> built with fftw-2 in this way.
On Itanium2, we also get that fftw3 performs badly for the complex case
The same holds for the 64 Bit Opteron machine.
Also note, that fftw3 support was only added recently
to new scipy.
I also think that it might not have recieved much testing
in old scipy, as it has been added only in July
Over the weekend I did some checks comparing the fftw
performance for "old" scipy (fftw2 only) and new scipy (fft2 and fftw3)
on my PIII laptop, see test_AB.png.
Darren also has sent me his results off-list, see test_DD.png.
Both plots (and the script + input file) are at
and display the ratio of the scipy time vs. Numeric time for the fftw,
so anything below 1 is not ok.
E.g. for data like:
Fast Fourier Transform
| real input | complex input
size | scipy | Numeric | scipy | Numeric
100 | 0.23 | 0.23 | 1.56 | 0.23 (secs for 7000 calls)
1000 | 0.17 | 0.31 | 1.62 | 0.30 (secs for 2000 calls)
256 | 0.40 | 0.46 | 3.21 | 0.47 (secs for 10000 calls)
512 | 0.55 | 0.84 | 4.19 | 0.81 (secs for 10000 calls)
1024 | 0.09 | 0.16 | 0.67 | 0.15 (secs for 1000 calls)
2048 | 0.16 | 0.28 | 1.16 | 0.30 (secs for 1000 calls)
4096 | 0.17 | 0.30 | 1.08 | 0.29 (secs for 500 calls)
8192 | 0.46 | 1.04 | 2.38 | 1.01 (secs for 500 calls)
I looked at the profiling output (on the scipy side)
for the fftw2 and and fftw3 case, but could not see
any difference which could cause the above effect.
> According to some benchmarks posted at
> http://www.fftw.org/speed/p4-2.2GHz-gcc/ , version 3 should be faster than
> version 2.
And for 1D complex, size 8192 fftw3 would almost be a
factor of 2 faster than fftw2!!
For 1D real, size 8192 it is still about 1.5.
> However, I haven't been able to build benchfft and test my own
> installation independent of scipy.
There is another way, if you built fftw3 from source:
./bench -opatient -s icf8192
./bench -opatient -s irf8192
and for 2D:
./bench -opatient -s icf256x256
- ./bench does not exist for fftw2
- i: in-place (o: out-of-place)
- c: complex (r: real)
- f: forward (b: backwards fft)
Hope that this somehow helps to get all this sorted!
Could maybe some scipy/fft(pack) expert explain, why for FFTW3
no caching is needed (see scipy/Lib/fftpack/src/zfft.c),
whereas for FFTW2 this is done?
((Presumably all this is explained in
but, at a quick glance I could not extract the relevant information...))
More information about the SciPy-user