[Numpy-discussion] object array alignment issues
Thu Oct 15 11:40:01 CDT 2009
I recently committed a regression test and bugfix for object pointers in
record arrays of unaligned size (meaning where each record is not a
multiple of sizeof(PyObject **)).
a1 = np.zeros((10,), dtype=[('o', 'O'), ('c', 'c')])
a2 = np.zeros((10,), 'S10')
# This copying would segfault
a1['o'] = a2
Unfortunately, this unit test has opened up a whole hornet's nest of
alignment issues on Solaris. The various reference counting functions
(PyArray_INCREF etc.) in refcnt.c all fail on unaligned object pointers,
for instance. Interestingly, there are comments in there saying
"handles misaligned data" (eg. line 190), but in fact it doesn't, and
doesn't look to me like it would. But I won't rule out a mistake in
building it on my part.
So, how to fix this?
One obvious workaround is for users to pass "align=True" to the dtype
constructor. This works if the dtype descriptor is a dictionary or
comma-separated string. Is there a reason it couldn't be made to work
with the string-of-tuples form that I'm missing? It would be marginally
more convenient from my application, but that's just a finesse issue.
However, perhaps we should try to fix the underlying alignment
problems? Unfortunately, it's not clear to me how to resolve them
without at least some performance penalty. You either do an alignment
check of the pointer, and then memcpy if unaligned, or just always use
memcpy. Not sure which is faster, as memcpy may have a fast path
already. These are object arrays anyway, so there's plenty of overhead
already, and I don't think this would affect regular numerical arrays.
If we choose not to fix it, perhaps we should we try to warn when
creating an unaligned recarray on platforms where it matters? I do
worry about having something that works perfectly well on one platform
fail on another.
In the meantime, I'll just mark the new regression test to "skip on
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA
More information about the NumPy-Discussion