[Numpy-discussion] Re: Bytes Object and Metadata

Travis Oliphant oliphant at ee.byu.edu
Wed Mar 30 11:39:02 CST 2005


>A Dimarts 29 Març 2005 01:59, Travis Oliphant va escriure:
>  
>
>>My proposal:
>>
>>__array_data__  (optional object that exposes the PyBuffer protocol or a
>>sequence object, if not present, the object itself is used).
>>__array_shape__ (required tuple of int/longs that gives the shape of the
>>array)
>>__array_strides__ (optional provides how to step through the memory in
>>bytes (or bits if a bit-array), default is C-contiguous)
>>__array_typestr__ (optional struct-like string showing the type ---
>>optional endianness indicater + Numeric3 typechars, default is 'V')
>>__array_itemsize__ (required if above is 'S', 'U', or 'V')
>>__array_offset__ (optional offset to start of buffer, defaults to 0)
>>
>>    
>>
>
>Considering that heterogenous data is to be suported as well, and
>there is some tradition of assigning names to the different fields, I
>wonder if it would not be good to add something like:
>
>__array_names__ (optional comma-separated names for record fields)
>
>  
>
I'm O.K. with that.

After more thought,  I think using the struct-like typecharacters is not 
a good idea for the array protocol.    I think that the character codes 
used by the numarray record array:  kind_character + byte_width is 
better.  Commas can separate heterogeneous data.    The problem is that 
if the data buffer originally came from a different machine or saved 
with a different compiler (e.g. a mmap'ed file), then the struct-like 
typecodes only tell you the c-type that machine thought the data was.  
It does not tell you how to interpret the data on this machine. 

So,  I think we should use the __array_typestr__ method to pass type 
information using the kind_character + byte_width method.  I'm also 
going to use this type information for pickles, so that arrays pickled 
on one machine type will be able to be interpreted on another with ease.

Bool                      -- "b%d" % sizeof(bool)
Signed Integer     -- "i%d" % sizeof(<some int>)
Unsigned Integer -- "u%d" % sizeof(<some uint>)
Float                      -- "f%d" % sizeof(<some float>)
Complex                --  "c%d" % sizeof(<some complex>)
Object                   --  "O%d" % sizeof(PyObject *)      --- this 
would only be useful on shared memory
String                    --  "S%d"  % itemsize
Unicode                --   "U%d" % itemsize
Void                      --    "V%d" % itemsize   

I also think that rather than attach < or > to the start of the string 
it would be easier to have another protocol for endianness.  Perhaps 
something like:

__array_endian__  (optional Python integer with the value 1 in it).  If 
it is not 1, then a byteswap must be necessary. 

-Travis






More information about the Numpy-discussion mailing list