[SciPy-User] [ANN] pyfftw-0.2 released
Mon Feb 15 16:23:52 CST 2010
On 02/15/10 20:17, David Cournapeau wrote:
> Sebastian Haase wrote:
> > On Mon, Feb 15, 2010 at 11:46 AM, David Cournapeau
> > <firstname.lastname@example.org> wrote:
> >> Sebastian Haase wrote:
> >>> Has this changed from FFTW2 to FFTW3 ?
> >>> It would really limit the use of plans, and make overall FFTs much
> >>> slower. In my specific case I very often have 512x512 single-precision
> >>> real arrays (images), that I would do ffts over and over again. But
> >>> the pointers would change ....
> >> You can, but you need to use the advanced plan API, or use the recently
> >> added new-array execute function:
> >> http://www.fftw.org/fftw3_doc/New_002darray-Execute-Functions.html#New_002darray-Execute-Functions
> > so it sounds like the alignment is the "killer" argument for the whole idea:
> > quote
> Well, yes, you need aligned pointers, there is no way around it if you
> want to (significantly) benefit from SSE - that's why I proposed some
> time ago now an aligned allocator to be used inside NumPy, so that many
> numpy arrays would be aligned by default.
I still think this is a very good idea. What were the main objections around
this at the time?
> Note that you can align them by yourself if you want to (there are
> several recipes on how to do that, one from Travis on Enthought blog
> IIRC, and one from Anne in the NumPy ML). Or explicitly create plans for
> unaligned arrays (this is significantly slower, though, but should be at
> least as fast as fftw2).
There is a function in pyfftw to create aligned arrays, and it does cause a
significant performance benefit to use aligned arrays.
> Also, most arrays allocated by malloc are *not* 16 bytes aligned on
> Linux, because for allocated areas above a certain size, the glibc
> malloc use mmap, and always "disalign" the allocated buffer. The
> threshold is easily reached when working with big data.
Just to clarify this is 32bit Linux, on 64bit malloc automatically aligns to
> > I guess this is really all new with version 3 of FFTW. I hope that
> > "reating a new plan is quick once one exists for a given size" means
> > "neglectable" for 512x512 arrays !?
> You would have to test, but IIRC, the cost is not negligeable. Creating
> an API around those plans should not be very difficult - at worse, you
> can take a look at how scipy used to do it when scipy was supporting
> FFTW backend. The problem is designing a fast API - especially for small
> size arrays (~ 2**10), fft is so fast that you cannot afford a lot while
> looking for cached plans :)
Well I looked at creating a more "traditional" API around fftw (something like
y=fft(x)) but the performance benefit for relatively small arrays (in my
experience ~2**12) was mainly eaten up by the creation of the output array.
Because most of the stuff I do uses arrays of around that size and does a lot of ffts
back and forth between two arrays (pulse propagation simulations if anyone is
interested), I went with the current approach (it's probably possible to create
some sort of memory pool to avoid the time of allocating arrays, is there
something like this in numpy already?)
> SciPy-User mailing list
More information about the SciPy-User