[Numpy-discussion] GSOC 2013
Tue Mar 5 01:01:32 CST 2013
> This made me think of a serious performance limitation of structured dtypes: a
> structured dtype is always "packed", which may lead to terrible byte alignment
> for common types. For instance, `dtype([('a', 'u1'), ('b',
> 'u8')]).itemsize == 9`,
> meaning that the 8-byte integer is not aligned as an equivalent C-struct's
> would be, leading to all sorts of horrors at the cache and register level.
> Python's ctypes does the right thing here, and can be mined for ideas. For
> instance, the equivalent ctypes Structure adds pad bytes so the 8-byte integer
> is on the correct boundary:
> class Aligned(ctypes.Structure):
> _fields_ = [('a', ctypes.c_uint8),
> ('b', ctypes.c_uint64)]
> print ctypes.sizeof(Aligned()) # --> 16
> I'd be surprised if someone hasn't already proposed fixing this, although
> perhaps this would be outside the scope of a GSOC project. I'm willing to
> wager that the performance improvements would be easily measureable.
I've been confronted to this very problem and ended up coding a "group class" which is a "split" structured array (each field is stored as a single array) offering the same interface as a regular structured array.
More information about the NumPy-Discussion