[SciPy-user] FFT speed ?

David Cournapeau david@ar.media.kyoto-u.ac...
Wed Feb 7 18:12:14 CST 2007


Andrew Jaffe wrote:
> Scott Ransom wrote:
>> On Wed, Feb 07, 2007 at 12:13:12PM -0500, Anne Archibald wrote:
>>> On 07/02/07, Samuel GARCIA <sgarcia@olfac.univ-lyon1.fr> wrote:
>>>>  Yes I know this.
>>>>  But 186888 186889 and 186890 are not power of 2 and the computation time is
>>>> very very different just for a difference of size of only one point.
>>>>  What is the reason ?
>>>>  And how to deal with that ? (I realy need to compute fft with a random
>>>> nomber of point)
>
>
> It's probably worth pointing out that 186889 is, in fact, prime, which 
> is certainly the worst case for any algorithm.
Indeed, this is an important point. There are several things to check, 
and there are some problems with FFT as implemented now in numpy/scipy 
when used with fftw (it is suboptimal by several factors).

    - First, is the numpy/scipy installed fft really using fftw ?
    - Also, if fftw is used, it is important to remember that the first 
time you are using a certain size, fftw has to compute a plan, which may 
take time.

    I tested on my machine (Pentium 4, 3.2 Ghz) fftw (in C) for the 
given size, each result being the best shot on 100 iterations (source 
attached):

testing cached for size 186888
computing plan...done !
         cycle is 1.357218e+10, 1.357218e+08 per execution on average, 
min is 1.323569e+08
testing cached for size 186889
computing plan...done !
         cycle is 6.188211e+10, 6.188211e+08 per execution on average, 
min is 6.016043e+08
testing cached for size 186890
computing plan...done !
         cycle is 1.623674e+10, 1.623674e+08 per execution on average, 
min is 1.595604e+08
testing cached for size 131072
computing plan...done !
         cycle is 1.730136e+10, 1.730136e+08 per execution on average, 
min is 1.682170e+08
testing cached for size 262144
computing plan...done !
         cycle is 3.018974e+10, 3.018974e+08 per execution on average, 
min is 2.978997e+08

cycle being the number of CPU cycles taken by the 100 iterations. The 
fact that 2 ** 17 is slower than 186888 is weird, though. May be due to 
some weird cache effects, I don't know.

For reference, on my laptop (pentium M, thus much better memory 
performances and FPU performances overall):

testing cached for size 186888
computing plan...done !
         cycle is 1.152835e+10, 1.152835e+08 per execution on average, 
min is 1.070749e+08
testing cached for size 186889
computing plan...done !
         cycle is 3.845418e+10, 3.845418e+08 per execution on average, 
min is 3.654266e+08
testing cached for size 186890
computing plan...done !
         cycle is 1.389736e+10, 1.389736e+08 per execution on average, 
min is 1.314722e+08
testing cached for size 131072
computing plan...done !
         cycle is 5.142997e+09, 5.142997e+07 per execution on average, 
min is 4.860954e+07
testing cached for size 262144
computing plan...done !
         cycle is 1.347407e+10, 1.347407e+08 per execution on average, 
min is 1.287470e+08

Which is more logical... This once again shows how bad a Pentium 4 is 
for FPU performances on a per cycle point of view :)
   
For speed issues, 0 padding is the easiest solution I can think of,

David
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.c
Type: text/x-csrc
Size: 2684 bytes
Desc: not available
Url : http://projects.scipy.org/pipermail/scipy-user/attachments/20070208/100ef389/attachment.bin 


More information about the SciPy-user mailing list