[Numpy-discussion] Records in scipy core

Travis Oliphant oliphant.travis at ieee.org
Fri Dec 2 09:14:01 CST 2005


> I'm not clear as to what the current design objective is and so I'll 
> try to recap and perhaps expand my pieces in the referenced discussion 
> to set out the sort of arrangement I would like to see.


I have two objectives:

1) Make the core scipy array object flexible enough to support a very 
good records sub-class.  In other works, I wonder if the core scipy 
array object could be made flexible enough to be used as a decent record 
array by itself, without adding much difficulty.   In the process, I'm 
really trying to understand how the data-type of an array should be 
generally considered.  An array object that has this generic perspective 
on data-type is what should go into Python, I believe.

2) Make a (more) useful records subclass of the ndarray object that is 
perhaps easier for the end-user to use.  Involved with this, of course, 
is making functions that make it easy to create a records sub-class.

>
> We are moving towards having a multi-dimensional array which can hold 
> objects of fixed size and type, the smallest being one byte (although 
> the void would appear to be a collection of no size objects).  Most of 
> the need, and thus the focus, is on numeric objects, ranging in size 
> from Int8 to Complex64.
>
> The Record is a fixed size object containing fields.  Each field has a 
> name, an optional title and data of a fixed type (perhaps including 
> another record instance and maybe arrays of fixed size?).

Right, so the record is really another kind of data-type.  The concept 
of the multidimensional array does not need adjustment, but the concept 
of what constitutes a data-type may need some fixing up.

>
> In the example below, AddressRecord and PersonRecord would be 
> sub-classes of Record where the fields are named and, optionally, 
> field titles given.  The names would be consistent with Python naming 
> whereas the title could be any Python string.

I like the notion of titles and names.  I think they are both useful.

>
> The use of attributes raises the possibility that one could have 
> nested records.  For example, suppose one has an address record:

Now, I'm in favor of attribute access.  But, nested records is possible 
without attribute access (it's just extra typing).   It's the underlying 
structure that provides the possibility for nested records (and that's 
what I'm trying to figure out how to support, generally).  If I can 
support this generally in the basic ndarray object by augmenting the 
notion of data-type as appropriate, then making a subclass that has the 
nice syntatic sugar is easy.

So, there are two issues really.

1) How to think about the data-type of a general ndarray object in order 
to support nested records in a straightforward way.

2) What should the syntatic sugar of a record array subclass be... 

I suppose a third is

3) How much of the syntatic sugar should be applied to all ndarray's?

-Travis

> I see no need to have the attribute 'field' and would like to avoid 
> the use of strings to identify a record component.  This does require 
> that fields be named as Python identifiers but is this restriction a 
> killer?

For a record array subclass that may be true.  But, as was raised by 
others in the previous thread, there are problems of "name-space" 
collision with the methods and attributes of the array that would 
prevent certain names from being used (and any additions to the methods 
and attributes of the array would cause incompatibilities with 
some-people's records).

But, at this point, I like the readability of the attribute access 
approach and could support it.

-Travis






More information about the Numpy-discussion mailing list