# [SciPy-user] numpy aligned memory

Sturla Molden sturla@molden...
Thu Mar 12 13:15:32 CDT 2009

```On 3/12/2009 4:05 PM, Andrew Straw wrote:

> So, what's your take on having each row aligned? Is this also useful for
> FFTW, for example? If so, we should perhaps come up with a better
> routine for the cookbook.

Ok, so here is how it could be done. It fails for a reason  I'll
attribute to a bug in NumPy.

import numpy as np

def _nextpow(b,isize):
i = 1
while b**i < isize:
i += 1
return b**i

def aligned_zeros(shape, boundary=16, dtype=float, order='C',
imagealign=True):

if (not imagealign) or (not hasattr(shape,'__len__')):
N = np.prod(shape)
d = np.dtype(dtype)
tmp = np.zeros(N * d.itemsize + boundary, dtype=np.uint8)
offset = (boundary - address % boundary) % boundary
return tmp[offset:offset+N*d.itemsize]\
.view(dtype=d)\
.reshape(shape, order=order)
else:
if order == 'C':
ndim0 = shape[-1]
dim0 = -1
else:
ndim0 = shape[0]
dim0 = 0
d = np.dtype(dtype)
bshape = [i for i in shape]
padding = boundary + _nextpow(boundary, d.itemsize) - d.itemsize
print bshape
tmp = np.zeros(bshape, dtype=np.uint8, order=order)
offset = (boundary - address % boundary) % boundary
aligned_slice = slice(offset, offset + ndim0*d.itemsize)
if tmp.flags['C_CONTIGUOUS']:
tmp = tmp[..., aligned_slice]
print tmp.shape
else:
tmp = tmp[aligned_slice, ...]
print tmp.shape

return tmp.view(dtype=dtype) # this will often fail,
# probably a bug in numpy

So lets reproduce the NumPy issue:

>>> a = zeros((10,52), dtype=uint8)
>>> b = a[:, 3:8*2+3]
>>> b.shape
(10, 16)
>>> b.view(dtype=float)

Traceback (most recent call last):
File "<pyshell#86>", line 1, in <module>
b.view(dtype=float)
ValueError: new type not compatible with array.

However:

>>> a = zeros((10,16), dtype=uint8)
>>> a.view(dtype=float)
array([[ 0.,  0.],
[ 0.,  0.],
[ 0.,  0.],
[ 0.,  0.],
[ 0.,  0.],
[ 0.,  0.],
[ 0.,  0.],
[ 0.,  0.],
[ 0.,  0.],
[ 0.,  0.]])

Until we find a way to overcome this, it will be difficult to align rows
to particular byte boundaries. It fails even if we make sure the padding
is a multiple of the item size:

padding = (boundary + _nextpow(boundary, d.itemsize) \
- d.itemsize) * d.itemsize

Very annoying..

Using allocators in libraries (e.g. FFTW) would not help either, as
NumPy would fail in the same way.

Maybe we can force NumPy to do the right thing by hard-coding an array
descriptor?

We can do this in Cython though, as it supports pointers and double
indirection. But it would be like using C.

Sturla Molden

```