[Numpy-discussion] Records in scipy core
Travis Oliphant
oliphant at ee.byu.edu
Thu Dec 1 17:12:09 CST 2005
Christopher Hanley wrote:
>Hi Travis,
>
>About a year ago (summer 2004) on the numpy distribution list there was
>a lot of discussion of the records interface. I will dig through my
>notes and put together a summary.
>
>
Thanks for the pointers. I had forgotten about that discussion. I
went back and re-read the thread.
Here's a good link for others to re-read (the end of) this thread:
http://news.gmane.org/find-root.php?message_id=%3cBD22BAC0.E9EB%25perry%40stsci.edu%3e
I think some very good points were made. These points should be
addressed from the context of scipy arrays which now support records in
a very basic way. Because of this, we can support nested records of
records --- but how is this to be presented to the user is still an open
question (i.e. how do you build one...)
I've finally been converted to believe that the notion of records is
very important because it speaks of how to do the basic (typeless,
mathless) array object that will go into Python correctly If we can get
the general records type done right, then all the other types are
examples of it.
Thus, I would like to revive discussion of the record object for
inclusion in scipy core. I pretty much agree with the semantics that
Perry described in his final email (is this all implemented in numarray,
yet?), except I would agree with Francesc Alted that a titles or labels
concept should be allowed.
I'm more enthusiastic about code than discussion, so I'm hoping for a
short-lived discussion followed by actual code. I'm ready to do the
implementation this week (I've already borrowed lots of great code from
numarray which makes it easier), but feel free to chime in even if you
read this later.
In my mind, the discussion about the records array is primarily a
discussion about the records data-type. The way I'm thinking, the scipy
ndarray is a homogeneous collection of the same "thing." The big change
in scipy core is that Numeric used to allow only certain data types, but
now the ndarray can contain an arbitrary "void" data type. You can also
add data-types to scipy core. These data-types are "almost" full
members of the scipy data-type community. The "almost" is because the
N*N casting matrix is not updated (this would require a re-design of
how casting is considered). At some point, I'd like to fix this wart
and make it so that data-types can be added at will -- I think if we get
the record type right, I'll be able to figure out how to do this.
We need to add a "record" data-type to scipy. Then, any array can be of
"record" type, and there will be an additional "array scalar" that is
what is returned when selecting a single element from the array. So, a
record array would simply be an array of "records" plus some extra stuff
for dealing with the mapping from field names to actual segments of the
array element (we may decide that this mapping is general enough that
all scipy arrays should have the capability of assigning names to
sub-bytes of its main data-type and means of accessing those sub-bytes
in which case the subclass is unnecessary).
Let me explain further: Right now, the machinery is in place in
scipy_core to get and set in any ndarray (regardless of its data-type)
an arbitrary "field". A "field" in this context is defined as a
sub-section of the basic element making up the array. Generically the
sub-section is defined by an offset and a data-type or a tuple of a data
type and a shape (to allow sub-arrays in a record). What I understand
the user to want is the binding of a name to this generic sub-section
descriptor.
1) Should we allow that for every scipy ndarray: complex data types
have an obvious binding, would anybody want to name the first two bytes
of their int32 array? I suggest holding off on this one until a records
array is working....
2) Supposing we don't go with number 1, we need to design a record data
type that has this name-binding capability.
The recarray class in scipy core SVN essentially just does this.
Question: How important is backwards compatibility with old numarray
specification. In particular, I would go with the .fields access
described by Perry, and eliminate the .field() approach?
Thanks for reading and any comments you can make.
-Travis
More information about the Numpy-discussion
mailing list