[Scipy-tickets] [SciPy] #189: Add support for p4fftwgel - extremly fast fft for Intel P4

SciPy scipy-tickets@scipy....
Thu Jun 28 05:07:23 CDT 2007


#189: Add support for p4fftwgel - extremly fast fft for Intel P4
---------------------------+------------------------------------------------
 Reporter:  pearu          |        Owner:  pearu
     Type:  enhancement    |       Status:  new  
 Priority:  normal         |    Milestone:       
Component:  scipy.fftpack  |      Version:       
 Severity:  normal         |   Resolution:       
 Keywords:                 |  
---------------------------+------------------------------------------------
Comment (by cdavid):

 There are two different problems: aligned data, and how to deal with non-
 aligned data in the current fft system. Numpy arrays are not aligned on 16
 bytes boundaries. I don't know how difficult it would be to force
 alignement: for arrays with newly created data buffer, this is really easy
 to support (replacing PyDataMem_NEW by an 16 bytes aligned allocator
 instead of malloc, eg posix_memalign on unix and whatever on windows); but
 how to do for arrays created with existing data ?

 Now, assuming we may have aligned and unaligned arrays, this is a bit
 hairy. As you know, by default fftw3 defines plans on given arrays (one
 reason is directly linked to alignement issues). So you have two basic
 choices:
   * caching plans with aligned buffers and copying data between arrays and
 those working buffers back and forth (this is the approach I followed to
 solve ticket #1). Copying data to buffers take a large amount of time
 (about half the time for size around 2^9 - 2^12 on my P4), but this uses
 SIMD
   * caching plans using advanced and guru planning. Guru planning can be
 used to apply a given plan to a different array given several condition
 are met, including alignement. But then the problems is that for in place
 transform, if you use FFTW_MEASURE, input are destroyed... So you have to
 use FFTW_ESTIMATE for plans, which lead to worse results than using
 FFTW_MEASURE with copies...

 Basically, if we want optimal performances with fftw3, I don't think we
 can use the current cache system; we need something a bit more
 sophisticated (that's exactly why I started to rewrite the code of
 fftpack).

-- 
Ticket URL: <http://projects.scipy.org/scipy/scipy/ticket/189#comment:3>
SciPy <http://www.scipy.org/>
SciPy is open-source software for mathematics, science, and engineering.


More information about the Scipy-tickets mailing list