[Numpy-discussion] numpy arrays, data allocation and SIMD alignement
Steven G. Johnson
Sat Aug 4 22:20:31 CDT 2007
On Aug 4, 3:24 am, "Anne Archibald" <peridot.face...@gmail.com> wrote:
> It seems to me two things are needed:
> * A mechanism for requesting numpy arrays with buffers aligned to an
> arbitrary power-of-two size (basically just using posix_memalign or
> some horrible hack on platforms that don't have it).
Right, you might as well allow the alignment (to a power-of-two size)
to be specified at runtime, as there is really no cost to implementing
an arbitrary alignment once you have any alignment.
Although you should definitely use posix_memalign (or the old
memalign) where it is available, unfortunately it's not implemented on
all systems. e.g. MacOS X and FreeBSD don't have it, last I checked
(although in both cases their malloc is 16-byte aligned). Microsoft VC
++ has a function called _aligned_malloc which is equivalent.
However, since MinGW (www.mingw.org) didn't have an _aligned_malloc
function, I wrote one for them a few years ago and put it in the
public domain (I use MinGW to cross-compile to Windows from Linux and
need the alignment). You are free to use it as a fallback on systems
that don't have a memalign function if you want. It should work on
any system where sizeof(void*) is a power of two (i.e. every extant
architecture, that I know of). You can download it and its test
It just uses malloc with a little extra padding as needed to align the
data, plus a copy of the original pointer so that you can still free
and realloc (using _aligned_free and _aligned_realloc). It could be
made a bit more efficient, but it probably doesn't matter.
> * A macro (in C, and some way to get the same information from python,
> perhaps just "a.ctypes.data % 16") to test for common alignment cases;
> SIMD alignment and arbitrary power-of-two alignment are probably
In C this is easy, just ((uintptr_t) pointer) % 16 == 0.
You might also consider a way to set the default alignment of numpy
arrays at runtime, rather than requesting aligned arrays
individually. e.g. so that someone could come along at a later date
to a large program and just add one function call to make all the
arrays 16-byte aligned to improve performance using SIMD libraries.
Steven G. Johnson
More information about the Numpy-discussion