[SciPy-dev] fft code for testing

Chuck Harris Chuck.Harris at sdl.usu.edu
Sat May 31 10:26:26 CDT 2003

Hi Pearu,

-----Original Message-----
From:	Pearu Peterson [mailto:pearu at scipy.org]
Sent:	Sat 5/31/2003 3:22 AM
To:	scipy-dev at scipy.net
Subject:	Re: [SciPy-dev] fft code for testing


> Notes:
> ------
> 1) When running benchmarks multiple times, then the variance
>    of results seem to vary from 0.1 to 0.5 for increasing size.
> 2) In all cases scipy.fftpack uses FFTW.
> 3) fftwork appears to be faster than scipy.fftpack on faster machines.
>    Considering the variance of test results, fftwork and scipy (using
>    FFTW) have comparable performance.

> In conclusion, I think it makes sense working further with fftwork
> in order to speed up power-of-two fft for those scipy users 
> who don't/won't have fftw libraries installed.
> Integration of fftwork with scipy.fftpack should be very
> similar to how djbfft is wrapped by scipy.fftpack. I'll
> try making the corresponding patch later on.

Thanks for all the results, they are quite interesting. The performance
of the chunked routines will depend on the size/speed of cache and
the front side bus and the size of the prefetch. They could probably 
be tuned for the PII by changing the chunk size, but I'm not sure it 
would be worth the effort. 

I have found some of the oddities in the benchmarks to be quite 
repeatable; for instance, on my home machine FFTW does notably better 
on the 1 MiB transform than it does on the 512 Kib or 2 MiB transforms. 

> Btw, does anybody know an efficient way how to test if
> an integer is a power-of-two in C? Currently djbfft wrapper
> uses
>   switch (n) {
>     case 2:;case 4:;... case 8192: ..
>   }
> but there must be a better way.

Breaks in the case statements might produce nicer assembler output.
Then again, maybe not. You've made me curious!

IIRC, case statements are compiled into jump tables or long if ... elif
strings. The latter is really not so bad -- just an unrolled table 
search. Bit ugly though. A simple loop

for(i = 1 ; i < n ; i *= 2);
if (i != n) error;

might do as well. But ... the loop will not terminate if n is too
big (i overflows and becomes negative). Maybe the case statement is
about the best, the assembler output could be quite clean and fast.
In python, a hash table set up at module load time is what I was 
thinking of. 

Other thoughts :

For real transforms an old approach (not thru the fast Hartley)
would put the two real parts of the spectrum together at the 
beginning of the returned data, the rest being complex. This is easily
turned into all complex and is probably the nicest from the Numeric
point of view. It becomes a bit ugly for 2D transforms, but the
Hartley might be preferred for this anyway.

A major chunk of time is spent in the bit reversal of the data. I
can't think of any cache friendly approach to this; a straight
forward radix sort performed even worse. Ideas welcome.

Other transforms that might be of interest : Hadamard, number theory.

> Chuck, do you have write access to scipy CVS? 
> If not, can this be arranged?

I don't have access, but it could probably be arranged with a word
from you ;)

> Thanks,
> 	Pearu

Thanks for taking a look at this....Chuck

Scipy-dev mailing list
Scipy-dev at scipy.net

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/ms-tnef
Size: 4237 bytes
Desc: not available
Url : http://projects.scipy.org/pipermail/scipy-dev/attachments/20030531/4c545a37/attachment.bin 

More information about the Scipy-dev mailing list