FW: [Numpy-discussion] typecodes in numarray

Todd Miller jmiller at stsci.edu
Sat Jan 25 11:16:05 CST 2003


Francesc Alted wrote:

>A Divendres 24 Gener 2003 21:15, Todd Miller va escriure:
>  
>
>>>My current thinking is something like:
>>>
>>>recarrDescr = {
>>>   "name"        : defineType(CharType, 16, ""),  # 16-character String
>>>   "TDCcount"    : defineType(UInt8, 1, 0),    # unsigned byte
>>>   "ADCcount"    : defineType(Int16, 1, 0),    # signed short integer
>>>   "grid_i"      : defineType(Int32, 1, 9),    # integer
>>>   "grid_j"      : defineType(Int32, 1, 9),    # integer
>>>   "pressure"    : defineType(Float32, 1, 1.),  # float 
>>>(single-precision) "temperature" : defineType(Float64, 32, arange(32)), 
>>># double[32] "idnumber"    : defineType(Int64, 1, 0),    # signed long
>>>long }
>>>      
>>>
Still think I'd prefer something seperable:

recarrStruct = (   (CharType, 16),
                            UInt8,
                            Int16,
                            Int32,
                            Int32,
                            Float32,
                            (Float64, 32),
                            Int64 )

recarrFields = ["name",
  "TDCcount",
  "ADCcount",
   "grid_i",
   "grid_j",
   "pressure",
   "temperature",
   "idnumber"]

I guess it might not be quite as good for large structs.

>>>where defineType is a class that accepts (type, shape, default)
>>>parameters. It can be extended safely in the future if more needs appear.
>>>      
>>>
>>You're way ahead of me here.  The only thing I don't like about this is
>>the additional relative complexity because of the addition of field
>>names and default values.   It would be nice to layer this more.
>>
>>    
>>
>
>Well, I think a map between field names and values is valuable from the
>user's point of view. It may help him to label the different information on
>the recarray. Moreover, if __getattr__ and __setattr__ methods (or
>__getitem__ and __setitem__) would get implemented on recarray (as they are
>in my recarray2 version, for example), the field name can become a very
>convenient manner to access a specific field by name (this introduce the
>limitation that field name must be a valid python identifier, but I think
>this is not a big restriction). By looking at the description dictionary,
>the user can have a quick idea of what he can find in every field (with no
>need of counting, which can be a big advantage specially for long records).
>
That's true and sounds nice.  I'm just thinking records with named 
fields should be derived
from records with positional fields.  If the functionality is layered, 
 you can use as much
complexity as you need.

It's a good sign that both you and I thought of an identical tuple 
format; it's the obvious
minimal one.

>
>With regard to default values, you can make this parameter (even the shape)
>a keyword parameter in order to make it optional. 
>
OK.  That's a good point.

>  
>
>>One more thing I don't understand looking at this:  a dictionary is 
>>unordered.
>>    
>>
>
>Yeah, but this can be regarded as an advantage rather than a drawback in the
>sense that you can choose the order you (the developer) prefer. For example,
>I was using first a alphanumerical order to arrange the data fields, but
>now, I'm considering that a arrangement that optimizes the alignment of the
>fields could be far better. As for one, say that you have a (Int8, Int32,
>Float64) record; in principle it could be easy to create a routine that
>arranges this record in the form (Float64,Int32, Int8) that optimizes the
>different field access (it may be even possible to introduce automatic
>padding later on if recarrays would support them in the future).
>
>Maybe you are getting confused 
>
Yes and no. :)

>in thinking that recarrDescr will create the
>recarray. Not at all, this a *metadata* definition that can be passed to the
>actual recarray funtion for recarray creation. 
>
Just like the type repetition tuple except also including field names 
and default values.   I don't think you lost me.  For what we do,  the 
exact physical layout of the "struct" is important, so order matters.  I 
see order as part of the
meta-data,  but I don't usually deal with meta-entities so maybe I've 
got that part wrong.  :)

>Its function would be
>similar to the formats parameter (with typical values like "3a,4i,3w") in
>recarray.array, but with more verbosity and all the reported advantages.
>
>  
>
>>>instead of
>>>
>>>((Int16, 3),
>>>(Int32, 4),
>>>(Float64, 20),
>>>)
>>>      
>>>
>>This is pretty much exactly what I was thinking.  It is straightforward
>>to imagine and difficult to forget.
>>
>>    
>>
>>>the former being more handy in lots of situations.
>>>      
>>>
>>Would you please name some of these so we can explore handling them both
>>ways?
>>
>>    
>>
>
>Well, I'm afraid that the best advantage would be when dealing with
>recarrays in C extension modules. In this kind of situation it would be far
>better to deal with a "3a4i3w" array than a tuple of python objects. But
>maybe I'm wrong and the latter is not so-complicated to manage; however, I
>used to work a lot with records (even before meeting recarray) and I was
>quite comfortable with formats in string mode.
>
I was thinking that if the above was an issue,  we could write an API 
function(s) to "compile" the type-repetition tuple into arrays of ints 
which describe the type of each field and corresponding repetition factor.

>
>Or perhaps it would be enough to provide a method for converting from the
>standard metadata layout (dictionary or tuple or whatever), to a string
>format. This should be not very difficult.
>  
>
Almost exactly what I suggested above.

See you Monday,
Todd






More information about the Numpy-discussion mailing list