[Numpy-discussion] RecArray.tolist() suggestion

Colin J. Williams cjw at sympatico.ca
Thu Jul 15 17:22:42 CDT 2004


Perry Greenfield wrote:

>Francesc Alted wrote:
>  
>
>>A Dijous 15 Juliol 2004 17:21, Colin J. Williams va escriure:
>>    
>>
>>>>What I propose is to be able to say:
>>>>        
>>>>
>>>>>>>r["c1"][1]
>>>>>>>              
>>>>>>>
>>>I would suggest going a step beyond this, so that one can have r.c1[1],
>>>see the script below.
>>>      
>>>
>>Yeah. I've implemented something similar to access column elements for
>>pytables Table objects. However, the problem in this case is that
>>there are
>>already attributes that "pollute" the column namespace, so that a column
>>named "size" collides with the size() method.
>>
>>    
>>
>The idea of mapping field names to attributes occurs to everyone
>quickly, but for the reasons Francesc gives (as well as another I'll
>mention) we were reluctant to implement it. The other reason is that
>it would be nice to allow field names that are not legal attributes
>(e.g., that include spaces or other illegal attribute characters).
>There are potentially people with data in databases or other similar
>formats that would like to map field name exactly. Well certainly
>one can still use the attribute approach and not support all field
>names (or column, or col...) it does introduce another glitch in
>the user interface when it works only for a subset of legal names.
>  
>
It would, I suggest, not be unduly restrictive to bar the existing 
attribute names but, if that's not
acceptable, Francesc has suggested the.col workaround, although I would 
prefer to avoid the
added clutter.

Incidentally, there is no current protection against wiping out an 
existing method:
[Dbg]>>> r1.size= 0
[Dbg]>>> r1.size
0
[Dbg]>>>

>  
>
>>I came up with a solution by adding a new "cols" attribute to the Table
>>object that is an instance of a simple class named Cols with no attributes
>>that can pollute the namespace (except some starting by "__" or "_v_").
>>Then, it is just a matter of provide functionality to access the different
>>columns. In that case, when a reference of a column is made,
>>another object
>>(instance of Column class) is returned. This Column object is basically an
>>accessor to column values with a __getitem__() and __setitem__() methods.
>>That might sound complicated, but it is not. I'm attaching part of the
>>relevant code below.
>>
>>I personally like that solution in the context of pytables because it
>>extends the "natural naming" convention quite naturally. A
>>similar approach
>>could be applied to RecArray objects as well, although numarray might (and
>>probably do) have other usage conventions.
>>
>>    
>>
>>>I have not explored the assignment of a value to r.c1.[1], but it seems
>>>to be achievable.
>>>      
>>>
>>in the schema I've just proposed the next should be feasible:
>>
>>value = r.cols.c1[1]
>>r.cols.c1[1] = value
>>
>>    
>>
>This solution avoids name collisions but doesn't handle the other
>problem. This is worth considering, but I thought I'd hear comments
>about the other issue before deciding it (there is also the
>"more than one way" issue as well; but this guideline seems to bend
>quite often to pragmatic concerns).
>
To allow for multi-word column names, assignment could replace a space 
by an underscore
and, in retrieval, the reverse could be done - ie. underscore would be 
banned for a column name.

Colin W.

>
>We're still chewing on all the other issues and plan to start floating
>some proposals, rationales and questions before long.
>
>Perry
>
>
>  
>





More information about the Numpy-discussion mailing list