[Numpy-discussion] numpy arrays, data allocation and SIMD alignement
David Cournapeau
david@ar.media.kyoto-u.ac...
Sat Aug 4 01:25:38 CDT 2007
>
>
> Here's a hack that google turned up:
>
> (1) Use static variables instead of dynamic (stack) variables
> (2) Use in-line assembly code that explicitly aligns data
> (3) In C code, use "*malloc*" to explicitly allocate variables
>
> Here is Intel's example of (2):
>
> ; procedure prologue
> push ebp
> mov esp, ebp
> and ebp, -8
> sub esp, 12
>
> ; procedure epilogue
> add esp, 12
> pop ebp
> ret
>
> Intel's example of (3), slightly modified:
>
> double *p, *newp;
> p = (double*)*malloc* ((sizeof(double)*NPTS)+4);
> newp = (p+4) & (~7);
>
> This assures that newp is 8-*byte* aligned even if p is not. However,
> *malloc*() may already follow Intel's recommendation that a *32*-*
> byte* or
> greater data structures be aligned on a *32* *byte* boundary. In
> that case,
> increasing the requested memory by 4 bytes and computing newp are
> superfluous.
>
>
> I think that for numpy arrays it should be possible to define the
> offset so that the result is 32 byte aligned. However, this might
> break some peoples' code if they haven't payed attention to the offset.
Why ? I really don't see how it can break anything at the source code
level. You don't have to care about things you didn't care before: the
best proof of that if that numpy runs on different platforms where the
malloc has different alignment guarantees (mac OS X already aligned to
16 bytes, for the very reason of making optimizing with SIMD easier,
whereas glibc malloc only aligns to 8 bytes, at least on Linux).
> Another possibility is to allocate an oversized array, check the
> pointer, and take a range out of it. For instance:
>
> In [32]: a = zeros(10)
>
> In [33]: a.ctypes.data % 32
> Out[33]: 16
>
> The array alignment is 16 bytes, consequently
>
> In [34]: a[2:].ctypes.data % 32
> Out[34]: 0
>
> Voila, 32 byte alignment. I think a short python routine could do
> this, which ought to serve well for 1D fft's. Multidimensional arrays
> will be trickier if you want the rows to be aligned. Aligning the
> columns just isn't going to work.
I am not suggesting realigning existing arrays. What I would like numpy
to support are the following cases:
- Check whether a given a numpy array is simd aligned:
/* Simple case: if aligned, use optimized func, use non optimized
otherwise */
int simd_func(double* in, size_t n);
int nosimd_func(double* in, size_t n);
if (PyArray_ISALIGNED_SIMD(a)) {
simd_func((double *)a->data, a->size);
} else {
nosimd_func((double *)a->data, a->size);
}
- Request explicitely an aligned arrays from any PyArray_* functions
which create a ndarray, eg: ar = PyArray_FROM_OF(a, NPY_SIMD_ALIGNED);
Allocating a buffer aligned to a given alignment is not the problem:
there is a posix functions to do it, and we can implement easily a
function for the OS who do not support it. This would be done in C, not
in python.
cheers,
David
More information about the Numpy-discussion
mailing list