[Numpy-discussion] Attaching metadata to dtypes: what's the best way?

Andrew Collette andrew.collette@gmail....
Thu Dec 13 23:37:20 CST 2012


Hi all,

I have a question for the list sparked by this discussion of a bug in
NumPy 1.6.2 and 1.7:

http://mail.scipy.org/pipermail/numpy-discussion/2012-December/064682.html

and this open issue in h5py:

https://code.google.com/p/h5py/issues/detail?id=217

In h5py we need to represent variable length strings and HDF5 object
references within the existing NumPy dtype system. The way this is
handled at the moment is with object (type "O") dtypes with a small
amount of metadata attached; in other words, an "O" array could have a
dtype marked as representing variable-length strings, and HDF5 would
convert the Python string objects into the corresponding type in the
HDF5 file.  Likewise, an "O" dtype marked as containing HDF5 object
references (h5py.Reference instances) would be converted to native
HDF5 references when written.

The trouble I'm having is trying to attach metadata to a dtype in such
a way that it is preserved in NumPy.  Right now I create an "O" dtype
with a single field and store the information in the field
"description", e.g.:

dtype(('O', [( ({'type': bytes},'vlen'), 'O' )] ))

This works (it's how special types have worked in h5py for years) but
is quite unwieldy, and leads to interesting side effects.  For
example, because of the single field used, array[index] returns a
1-element NumPy array containing a Python object, instead of the
Python object itself.  Worse, our fix for this behavior (remove the
field when returning data from h5py) triggered the above bug in NumPy.

Is there a better way to add metadata to dtypes I'm not aware of?
Note I'm *not* interested in creating a custom type; one of the
advantages of the current system is that people deal with the
resulting "O" object arrays like any other object array in NumPy.

Andrew Collette


More information about the NumPy-Discussion mailing list