[Numpy-discussion] Record arrays

Gael Varoquaux gael.varoquaux@normalesup....
Thu Jun 26 21:24:21 CDT 2008


I understand all your comments and thank you for making this distinction
explicit. I can see why recarray can slow code down, but I find attribute
lookup make code much more readable, and interactive work fantastic (tab
completion). For many of my applications I do have a strong use case for
these recarrays, and I am willing to take the speek cost (many of the
things I do are very for from being numerically intensiv).

On a side note, a pattern I use a lot (and incidently that Fernando and
Brian also came up with in ipython1) is a mixed object that acts like a
dictionary (and thus comes with all the goodies like the keys, iterkeys,
... methods, and the "in"), but exposes its keys as attributes:

class Bunch(dict):

    def __init__(self, **kwargs):
        dict.__init__(self, **kwargs)
        self.__dict__ = self

a = Bunch(a=1, b=2)

This is not directly related to the discussion, as the recarrays add more
to this (eg operations uniform over all the fields), but it does show
that this pattern is liked by many people.

My 2 cents,

Gaël

On Thu, Jun 26, 2008 at 03:25:11PM -0500, Robert Kern wrote:
> Let's be clear, there are two very closely related things: recarrays
> and record arrays. Record arrays are just ndarrays with a complicated
> dtype. E.g.

> In [1]: from numpy import *

> In [2]: ones(3, dtype=dtype([('foo', int), ('bar', float)]))
> Out[2]:
> array([(1, 1.0), (1, 1.0), (1, 1.0)],
>       dtype=[('foo', '<i4'), ('bar', '<f8')])

> In [3]: r = _

> In [4]: r['foo']
> Out[4]: array([1, 1, 1])


> recarray is a subclass of ndarray that just adds attribute access to
> record arrays.

> In [10]: r2 = r.view(recarray)

> In [11]: r2
> Out[11]:
> recarray([(1, 1.0), (1, 1.0), (1, 1.0)],
>       dtype=[('foo', '<i4'), ('bar', '<f8')])

> In [12]: r2.foo
> Out[12]: array([1, 1, 1])


> One downside of this is that the attribute access feature slows down
> all field accesses, even the r['foo'] form, because it sticks a bunch
> of pure Python code in the middle. Much code won't notice this, but if
> you end up having to iterate over an array of records (as I have),
> this will be a hotspot for you.

> Record arrays are fundamentally a part of numpy, and no one is even
> suggesting that they would go away. No one is seriously suggesting
> that we should remove recarray, but some of us hesitate to recommend
> its use over plain record arrays.

> Does that clarify the discussion for you?


More information about the Numpy-discussion mailing list