[Numpy-discussion] numpy arrays, data allocation and SIMD alignement

Charles R Harris charlesr.harris@gmail....
Sat Aug 4 01:06:15 CDT 2007


On 8/3/07, Charles R Harris <charlesr.harris@gmail.com> wrote:
>
>
>
> On 8/3/07, David Cournapeau <david@ar.media.kyoto-u.ac.jp> wrote:
> >
> > Andrew Straw wrote:
> > > Dear David,
> > >
> > > Both ideas, particularly the 2nd, would be excellent additions to
> > numpy.
> > > I often use the Intel IPP (Integrated Performance Primitives) Library
> > > together with numpy, but I have to do all my memory allocation with
> > the
> > > IPP to ensure fastest operation. I then create numpy views of the
> > data.
> > > All this works brilliantly, but it would be really nice if I could
> > > allocate the memory directly in numpy.
> > >
> > > IPP allocates, and says it wants, 32 byte aligned memory (see, e.g.
> > > http://www.intel.com/support/performancetools/sb/CS-021418.htm ).
> > Given
> > > that fftw3 apparently wants 16 byte aligned memory, my feeling is
> > that,
> > >   if the effort is made, the alignment width should be specified at
> > > run-time, rather than hard-coded.
> > I think that doing it at runtime would be overkill, no ? I was thinking
> > about making it a compile option. Generally, at the ASM level, you need
> > 16 bytes alignment (for instructions like movaps, which takes 16 bytes
> > in memory and put it in the SSE registers), this is not just fftw. Maybe
> > the 32 bytes alignment is useful for cache reasons, I don't know.
> >
> > I don't think it would be difficult to implement and validate; what I
> > don't know at all is the implication of this at the binary level, if
> > any.
>
>
>
> Here's a hack that google turned up:
>
> (1) Use static variables instead of dynamic (stack) variables
> (2) Use in-line assembly code that explicitly aligns data
> (3) In C code, use "*malloc*" to explicitly allocate variables
>
> Here is Intel's example of (2):
>
> ; procedure prologue
> push ebp
> mov esp, ebp
> and ebp, -8
> sub esp, 12
>
> ; procedure epilogue
> add esp, 12
> pop ebp
> ret
>
> Intel's example of (3), slightly modified:
>
> double *p, *newp;
> p = (double*)*malloc* ((sizeof(double)*NPTS)+4);
> newp = (p+4) & (~7);
>
> This assures that newp is 8-*byte* aligned even if p is not. However,
> *malloc*() may already follow Intel's recommendation that a *32*-* byte*or
> greater data structures be aligned on a * 32* *byte* boundary. In that
> case,
> increasing the requested memory by 4 bytes and computing newp are
> superfluous.
>

I think that for numpy arrays it should be possible to define the offset so
that the result is 32 byte aligned. However, this might break some peoples'
code if they haven't payed attention to the offset. Another possibility is
to allocate an oversized array, check the pointer, and take a range out of
it. For instance:

In [32]: a = zeros(10)

In [33]: a.ctypes.data % 32
Out[33]: 16

The array alignment is 16 bytes, consequently

In [34]: a[2:].ctypes.data % 32
Out[34]: 0

Voila, 32 byte alignment. I think a short python routine could do this,
which ought to serve well for 1D fft's. Multidimensional arrays will be
trickier if you want the rows to be aligned. Aligning the columns just isn't
going to work.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://projects.scipy.org/pipermail/numpy-discussion/attachments/20070804/fab625e1/attachment.html 


More information about the Numpy-discussion mailing list