[Numpy-discussion] GSOC 2013

Nicolas Rougier Nicolas.Rougier@inria...
Tue Mar 5 01:01:32 CST 2013


> This made me think of a serious performance limitation of structured dtypes: a
> structured dtype is always "packed", which may lead to terrible byte alignment
> for common types.  For instance, `dtype([('a', 'u1'), ('b',
> 'u8')]).itemsize == 9`,
> meaning that the 8-byte integer is not aligned as an equivalent C-struct's
> would be, leading to all sorts of horrors at the cache and register level.
> Python's ctypes does the right thing here, and can be mined for ideas.   For
> instance, the equivalent ctypes Structure adds pad bytes so the 8-byte integer
> is on the correct boundary:
> 
>    class Aligned(ctypes.Structure):
>        _fields_ = [('a', ctypes.c_uint8),
>                    ('b', ctypes.c_uint64)]
> 
>    print ctypes.sizeof(Aligned()) # --> 16
> 
> I'd be surprised if someone hasn't already proposed fixing this, although
> perhaps this would be outside the scope of a GSOC project.  I'm willing to
> wager that the performance improvements would be easily measureable.


I've been confronted to this very problem and ended up coding a "group class" which is a "split" structured array (each field is stored as a single array) offering the same interface as a regular structured array.
http://www.loria.fr/~rougier/coding/software/numpy_group.py


Nicolas



More information about the NumPy-Discussion mailing list