[Numpy-discussion] Re: Bytes Object and Metadata
oliphant at ee.byu.edu
Wed Mar 30 11:39:02 CST 2005
>A Dimarts 29 Març 2005 01:59, Travis Oliphant va escriure:
>>__array_data__ (optional object that exposes the PyBuffer protocol or a
>>sequence object, if not present, the object itself is used).
>>__array_shape__ (required tuple of int/longs that gives the shape of the
>>__array_strides__ (optional provides how to step through the memory in
>>bytes (or bits if a bit-array), default is C-contiguous)
>>__array_typestr__ (optional struct-like string showing the type ---
>>optional endianness indicater + Numeric3 typechars, default is 'V')
>>__array_itemsize__ (required if above is 'S', 'U', or 'V')
>>__array_offset__ (optional offset to start of buffer, defaults to 0)
>Considering that heterogenous data is to be suported as well, and
>there is some tradition of assigning names to the different fields, I
>wonder if it would not be good to add something like:
>__array_names__ (optional comma-separated names for record fields)
I'm O.K. with that.
After more thought, I think using the struct-like typecharacters is not
a good idea for the array protocol. I think that the character codes
used by the numarray record array: kind_character + byte_width is
better. Commas can separate heterogeneous data. The problem is that
if the data buffer originally came from a different machine or saved
with a different compiler (e.g. a mmap'ed file), then the struct-like
typecodes only tell you the c-type that machine thought the data was.
It does not tell you how to interpret the data on this machine.
So, I think we should use the __array_typestr__ method to pass type
information using the kind_character + byte_width method. I'm also
going to use this type information for pickles, so that arrays pickled
on one machine type will be able to be interpreted on another with ease.
Bool -- "b%d" % sizeof(bool)
Signed Integer -- "i%d" % sizeof(<some int>)
Unsigned Integer -- "u%d" % sizeof(<some uint>)
Float -- "f%d" % sizeof(<some float>)
Complex -- "c%d" % sizeof(<some complex>)
Object -- "O%d" % sizeof(PyObject *) --- this
would only be useful on shared memory
String -- "S%d" % itemsize
Unicode -- "U%d" % itemsize
Void -- "V%d" % itemsize
I also think that rather than attach < or > to the start of the string
it would be easier to have another protocol for endianness. Perhaps
__array_endian__ (optional Python integer with the value 1 in it). If
it is not 1, then a byteswap must be necessary.
More information about the Numpy-discussion