[SciPy-user] Record Array: How to add a column?

John Hunter jdh2358@gmail....
Tue Oct 14 11:55:25 CDT 2008

On Tue, Oct 14, 2008 at 11:28 AM, Pierre GM <pgmdevlist@gmail.com> wrote:
> John,
> Do you plan to have your modifications part of numpy.records ? In any case,
> I'll try to check whether it is easy to add support to missing data:
> MaskedArrays should now support with flexible-types.

I do not have concrete plans, but I have spoken with Jarrod about
moving some of these over, making some of them record array methods,
others available in the np.rec namespace.  I think the consensus is
that these are useful and belong in numpy, but we are awaiting someone
to do the port.

On the subject of masked record arrays.  We added masked array support
to mlab.csv2rec some time ago and it has caused no shortage of
headaches because of differences in the interface for objects for
masked record arrays and regular recarrays.

The following example shows a record array with a 'date' column which
is a O4 python object type. Here is the behavior of the recarray

  In [212]: !cat test1.csv
  In [213]: r1 = mlab.csv2rec('test1.csv')

  In [214]: type(r1)
  Out[214]: <class 'numpy.core.records.recarray'>

  In [215]: r1.dtype
  Out[215]: dtype([('date', '|O4'), ('age', '<i4'), ('name', '|S7')])

  In [216]: print r1[0].date.year

In particular, on a given row of the recarray, I can call object
methods and access object attributes.

In the next example, the data file has a missing value on the last row
in the 'age' column, so we return a masked record array

  In [217]: !cat test2.csv
  In [218]: type(r2)
  Out[218]: <class 'numpy.ma.mrecords.MaskedRecords'>

  In [219]: print r2.dtype
  [('date', '|O4'), ('age', '<i4'), ('name', '|S7')]

  In [220]: r2[0].date.year
  Traceback (most recent call last):
    File "<ipython console>", line 1, in ?
  AttributeError: 'MaskedArray' object has no attribute 'year'

It would help us a lot in this regard if we could access the
underlying object.  Is there a reason why the masked array behaves
differently when it comes to accessing the underlying object methods
and is there a sensible way to make them compatible?


More information about the SciPy-user mailing list