[SciPy-user] fftw, scipy ("old") install

Arnd Baecker arnd.baecker at web.de
Tue Oct 18 07:38:08 CDT 2005


Moin,

I spend the whole morning with fftw (hope none of my collaborators
reads this).

Status is as follows:

a) fftw2 works:

    Multi-dimensional Fast Fourier Transform
===================================================
          |    real input     |   complex input
---------------------------------------------------
   size   |  scipy  | Numeric |  scipy  |  Numeric
---------------------------------------------------
  100x100 |    0.04 |    0.07 |    0.04 |    0.09  (secs for 100 calls)
 1000x100 |    0.05 |    0.08 |    0.05 |    0.08  (secs for 7 calls)
  256x256 |    0.05 |    0.08 |    0.06 |    0.09  (secs for 10 calls)
  512x512 |    0.18 |    0.14 |    0.17 |    0.15  (secs for 3 calls)



Installation on the opteron with:
./configure CFLAGS="-fPIC -O3 -fomit-frame-pointer -fno-schedule-insns
-fschedule-insns2 -fstrict-aliasing"  --prefix=$TSTHOME



b) fftw3 works, but is slow:


    Multi-dimensional Fast Fourier Transform
===================================================
          |    real input     |   complex input
---------------------------------------------------
   size   |  scipy  | Numeric |  scipy  |  Numeric
---------------------------------------------------
  100x100 |    0.05 |    0.06 |    0.05 |    0.07  (secs for 100 calls)
 1000x100 |    0.05 |    0.09 |    0.06 |    0.08  (secs for 7 calls)
  256x256 |    0.10 |    0.09 |    0.10 |    0.08  (secs for 10 calls)
  512x512 |    0.24 |    0.13 |    0.24 |    0.13  (secs for 3 calls)

Already for 256x256  scipy.fft is slower than the Numeric one
and it gets worse for 512x512 (and much worse for larger NxN).


Installation on the opteron with

./configure  CFLAGS="-fPIC -O3 -fomit-frame-pointer -fno-schedule-insns
-fstrict-aliasing -mpreferred-stack-boundary=4" --prefix=$PHOME

(Using "--enable-sse --enable-sse2 --enable-k7",
"--enable-sse --enable-sse2", "--enable-sse" lead to compile errors).


So, fftw3 *does* work for scipy, and compiled with
the right flags the performance is much better than
the one reported before.

Still, it does not perform as good as it should.
Does anyone have better compile flags for an opteron?

Any ideas are very welcome!

Best,

Arnd



Full results for fftw2
----------------------

                 Fast Fourier Transform
=================================================
      |    real input     |   complex input
-------------------------------------------------
 size |  scipy  | Numeric |  scipy  | Numeric
-------------------------------------------------
  100 |    0.12 |    0.08 |    0.05 |    0.05  (secs for 7000 calls)
 1000 |    0.05 |    0.07 |    0.06 |    0.08  (secs for 2000 calls)
  256 |    0.09 |    0.11 |    0.26 |    0.15  (secs for 10000 calls)
  512 |    0.13 |    0.19 |    0.14 |    0.20  (secs for 10000 calls)
 1024 |    0.02 |    0.04 |    0.02 |    0.04  (secs for 1000 calls)
 2048 |    0.04 |    0.07 |    0.06 |    0.07  (secs for 1000 calls)
 4096 |    0.04 |    0.10 |    0.07 |    0.11  (secs for 500 calls)
 8192 |    0.10 |    0.48 |    0.24 |    0.50  (secs for 500 calls)
.....
    Multi-dimensional Fast Fourier Transform
===================================================
          |    real input     |   complex input
---------------------------------------------------
   size   |  scipy  | Numeric |  scipy  |  Numeric
---------------------------------------------------
  100x100 |    0.04 |    0.07 |    0.04 |    0.09  (secs for 100 calls)
 1000x100 |    0.05 |    0.08 |    0.05 |    0.08  (secs for 7 calls)
  256x256 |    0.05 |    0.08 |    0.06 |    0.09  (secs for 10 calls)
  512x512 |    0.18 |    0.14 |    0.17 |    0.15  (secs for 3 calls)
.....
       Inverse Fast Fourier Transform
===============================================
      |     real input    |    complex input
-----------------------------------------------
 size |  scipy  | Numeric |  scipy  | Numeric
-----------------------------------------------
  100 |    0.05 |    0.14 |    0.06 |    0.14  (secs for 7000 calls)
 1000 |    0.05 |    0.17 |    0.09 |    0.18  (secs for 2000 calls)
  256 |    0.10 |    0.28 |    0.12 |    0.29  (secs for 10000 calls)
  512 |    0.13 |    0.48 |    0.18 |    0.46  (secs for 10000 calls)
 1024 |    0.02 |    0.08 |    0.04 |    0.08  (secs for 1000 calls)
 2048 |    0.04 |    0.15 |    0.07 |    0.16  (secs for 1000 calls)
 4096 |    0.04 |    0.19 |    0.08 |    0.19  (secs for 500 calls)
 8192 |    0.12 |    0.68 |    0.26 |    0.69  (secs for 500 calls)
.......
Inverse Fast Fourier Transform (real data)
==================================
 size |  scipy  | Numeric
----------------------------------
  100 |    0.05 |    0.15  (secs for 7000 calls)
 1000 |    0.05 |    0.10  (secs for 2000 calls)
  256 |    0.09 |    0.24  (secs for 10000 calls)
  512 |    0.13 |    0.32  (secs for 10000 calls)
 1024 |    0.02 |    0.05  (secs for 1000 calls)
 2048 |    0.04 |    0.07  (secs for 1000 calls)
 4096 |    0.04 |    0.07  (secs for 500 calls)
 8192 |    0.10 |    0.20  (secs for 500 calls)
....
Fast Fourier Transform (real data)
==================================
 size |  scipy  | Numeric
----------------------------------
  100 |    0.05 |    0.07  (secs for 7000 calls)
 1000 |    0.04 |    0.05  (secs for 2000 calls)
  256 |    0.09 |    0.12  (secs for 10000 calls)
  512 |    0.12 |    0.16  (secs for 10000 calls)
 1024 |    0.01 |    0.03  (secs for 1000 calls)
 2048 |    0.04 |    0.04  (secs for 1000 calls)
 4096 |    0.03 |    0.05  (secs for 500 calls)
 8192 |    0.09 |    0.15  (secs for 500 calls)
...


Full results for fftw3
----------------------

                 Fast Fourier Transform
=================================================
      |    real input     |   complex input
-------------------------------------------------
 size |  scipy  | Numeric |  scipy  | Numeric
-------------------------------------------------
  100 |    0.12 |    0.14 |    0.32 |    0.06  (secs for 7000 calls)
 1000 |    0.04 |    0.07 |    0.36 |    0.08  (secs for 2000 calls)
  256 |    0.09 |    0.11 |    0.66 |    0.11  (secs for 10000 calls)
  512 |    0.15 |    0.19 |    0.94 |    0.19  (secs for 10000 calls)
 1024 |    0.03 |    0.03 |    0.14 |    0.04  (secs for 1000 calls)
 2048 |    0.04 |    0.07 |    0.27 |    0.07  (secs for 1000 calls)
 4096 |    0.05 |    0.11 |    0.26 |    0.11  (secs for 500 calls)
 8192 |    0.11 |    0.48 |    0.62 |    0.51  (secs for 500 calls)
.....
    Multi-dimensional Fast Fourier Transform
===================================================
          |    real input     |   complex input
---------------------------------------------------
   size   |  scipy  | Numeric |  scipy  |  Numeric
---------------------------------------------------
  100x100 |    0.05 |    0.06 |    0.05 |    0.07  (secs for 100 calls)
 1000x100 |    0.05 |    0.09 |    0.06 |    0.08  (secs for 7 calls)
  256x256 |    0.10 |    0.09 |    0.10 |    0.08  (secs for 10 calls)
  512x512 |    0.24 |    0.13 |    0.24 |    0.13  (secs for 3 calls)
.....
       Inverse Fast Fourier Transform
===============================================
      |     real input    |    complex input
-----------------------------------------------
 size |  scipy  | Numeric |  scipy  | Numeric
-----------------------------------------------
  100 |    0.05 |    0.15 |    0.51 |    0.15  (secs for 7000 calls)
 1000 |    0.04 |    0.18 |    0.44 |    0.17  (secs for 2000 calls)
  256 |    0.09 |    0.28 |    0.92 |    0.28  (secs for 10000 calls)
  512 |    0.16 |    0.46 |    1.20 |    0.46  (secs for 10000 calls)
 1024 |    0.02 |    0.08 |    0.18 |    0.08  (secs for 1000 calls)
 2048 |    0.04 |    0.14 |    0.30 |    0.15  (secs for 1000 calls)
 4096 |    0.04 |    0.18 |    0.28 |    0.19  (secs for 500 calls)
 8192 |    0.11 |    0.66 |    0.67 |    0.68  (secs for 500 calls)
.......
Inverse Fast Fourier Transform (real data)
==================================
 size |  scipy  | Numeric
----------------------------------
  100 |    0.05 |    0.16  (secs for 7000 calls)
 1000 |    0.06 |    0.09  (secs for 2000 calls)
  256 |    0.10 |    0.25  (secs for 10000 calls)
  512 |    0.13 |    0.33  (secs for 10000 calls)
 1024 |    0.02 |    0.05  (secs for 1000 calls)
 2048 |    0.05 |    0.06  (secs for 1000 calls)
 4096 |    0.04 |    0.07  (secs for 500 calls)
 8192 |    0.11 |    0.19  (secs for 500 calls)
....
Fast Fourier Transform (real data)
==================================
 size |  scipy  | Numeric
----------------------------------
  100 |    0.05 |    0.07  (secs for 7000 calls)
 1000 |    0.04 |    0.06  (secs for 2000 calls)
  256 |    0.10 |    0.12  (secs for 10000 calls)
  512 |    0.14 |    0.16  (secs for 10000 calls)
 1024 |    0.03 |    0.02  (secs for 1000 calls)
 2048 |    0.04 |    0.04  (secs for 1000 calls)
 4096 |    0.04 |    0.04  (secs for 500 calls)
 8192 |    0.09 |    0.14  (secs for 500 calls)
...



More information about the SciPy-user mailing list