[SciPy-dev] FFTW performances in scipy and numpy
Wed Aug 1 12:22:59 CDT 2007
On 01/08/07, David Cournapeau <firstname.lastname@example.org> wrote:
> John Travers wrote:
> > On 01/08/07, David Cournapeau <email@example.com> wrote:
> >> Anne Archibald wrote:
> >>> On 01/08/07, David Cournapeau <firstname.lastname@example.org> wrote:
> >>>> I am one of the contributor to numpy/scipy. Let me first say I am
> >>>> *not* the main author of the fftw wrapping for scipy, and that I am a
> >>>> relatively newcommer in scipy, and do not claim a deep understanding of
> >>>> numpy arrays. But I have been thinking a bit on the problem since I am a
> >>>> big user of fft and debugged some problems in the scipy code since.
> >> Ok, I prepared a small package to test several strategies:
> >> http://www.ar.media.kyoto-u.ac.jp/members/david/archives/fftdev.tbz2
> >> By doing make test, it should build out of the box and run the tests (if
> >> you are on Linux, have gcc and fftw3, of course :) ). I did not even
> >> check whether the computation is OK (I just tested against memory
> >> problems under valgrind).
> >> 3 strategies are available:
> >> - Have a flag to check whether the given array is 16 bytes aligned,
> >> and conditionnally build plans using this info
> >> - Use FFTW_UNALIGNED, and do not care about alignement
> >> - Current strategy: copy.
> >> The three strategies use FFTW_MEASURE, which I didn't do before, and may
> > Another strategy worth trying is using FFTW_MEASURE once and then
> > using FFTW_ESTIMATE for additional arrays. FFTW accumulates wisdom and
> > so the initial call with MEASURE means that further estimated plans
> > also benefit. In my simple tests it comes very close to measuring for
> > each individual array.
> Is this true for different arrays size ?
Yes it is, in fact, if you use the fftw_flops function you find that
the number of operations required is identical if you plan with
measure, plan with estimate (with experience at the same size) or if
you plan with estimate with experience at a different size. Of course
this is only on my machine (AMD Athlon 64 3200+). The only extra
overhead is the planning for each fft (and I haven't tried a
comparison with unaligned data). This overhead appears to be about 10%
for small (2**15) size arrays.
More information about the Scipy-dev