[Numpy-discussion] numpy arrays, data allocation and SIMD alignement

Andrew Straw strawman@astraw....
Fri Aug 3 10:12:44 CDT 2007


Dear David,

Both ideas, particularly the 2nd, would be excellent additions to numpy. 
I often use the Intel IPP (Integrated Performance Primitives) Library 
together with numpy, but I have to do all my memory allocation with the 
IPP to ensure fastest operation. I then create numpy views of the data. 
All this works brilliantly, but it would be really nice if I could 
allocate the memory directly in numpy.

IPP allocates, and says it wants, 32 byte aligned memory (see, e.g. 
http://www.intel.com/support/performancetools/sb/CS-021418.htm ). Given 
that fftw3 apparently wants 16 byte aligned memory, my feeling is that, 
  if the effort is made, the alignment width should be specified at 
run-time, rather than hard-coded.

In terms of implementation of your 1st point, I'm not aware of how much 
effort your idea would take (and it does sound nice), but some benefit 
would be had just from a simple function numpy.is_mem_aligned( ndarray, 
width=16 ) which returns a bool.

Cheers!
Andrew

David Cournapeau wrote:
> Hi,
> 
>    Following an ongoing discussion with S. Johnson, one of the developer 
> of fftw3, I would be interested in what people think about adding 
> infrastructure in numpy related to SIMD alignement (that is 16 bytes 
> alignement for SSE/ALTIVEC, I don't know anything about other archs). 
> The problem is that right now, it is difficult to get information for 
> alignement in numpy (by alignement here, I mean something different than 
> what is normally meant in numpy context; whether, in my understanding, 
> NPY_ALIGNED refers to a pointer which is aligned wrt his type, here, I 
> am talking about arbitrary alignement).
>   For example, for fftw3, we need to know whether a given data buffer is 
> 16 bytes aligned to get optimal performances; generally, SSE needs 16 
> byte alignement for optimal performances, as well as altivec. I think it 
> would be nice to get some infrastructure to help developers to get those 
> kind of information, and maybe to be able to request 16 aligned buffers.
>    Here is what I can think of:
>       - adding an API to know whether a given PyArrayObject has its data 
> buffer 16 bytes aligned, and requesting a 16 bytes aligned 
> PyArrayObject. Something like NPY_ALIGNED, basically.
>       - forcing data allocation to be 16 bytes aligned in numpy (eg 
> define PyDataMem_Mem to a 16 bytes aligned allocator instead of malloc). 
> This would mean that many arrays would be "naturally" 16 bytes aligned 
> without effort.
> 
> Point 2 is really easy to implement I think: actually, on some platforms 
> (Mac OS X and FreeBSD), malloc returning 16 bytes aligned buffers 
> anyway, so I don't think the wasted space is a real problem. Linux with 
> glibc is 8 bytes aligned, I don't know about windows. Implementing our 
> own 16 bytes aligned memory allocator for cross platform compatibility 
> should be relatively easy. I don't see any drawback, but I guess other 
> people will.
> 
> Point 1 is more tricky, as this requires much more changes in the code.
> 
> Do main developers of numpy have an opinion on this ?
> 
>    cheers,
> 
>    David
> 
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion



More information about the Numpy-discussion mailing list