[Numpy-discussion] more recfunctions, structured array help

Pierre GM pgmdevlist@gmail....
Tue Dec 8 20:07:30 CST 2009

On Dec 8, 2009, at 7:27 PM, John [H2O] wrote:
> Maybe I should add, I'm looking at this thread:
> http://old.nabble.com/masked-record-arrays-td26237612.html
> And, I guess I'm in the same situation as the OP there. It's not clear to
> me, but as best I can tell I am working with structured arrays (that's from
> np.rec.fromrecords creates, no?).
> Anyway, perhaps the simplest thing someone could do to help is to show how
> to create a masked structured array.
> Thanks!

(Note to self: one of us all's gonna have to write some doc about that...)

A structured array is a ndarray with named fields. Like a standard ndarray, each item has a given size defined by the dtype. At the difference of a standard ndarray, each item is composed of different sub-items whose types don't have to be homogeneous. Each item is a special numpy scalar called a numpy.void.
For example:
>>> x = np.array([('a',1),('b',2)],dtype=[('F0','|S1'),('F1',float)]) 

The first item, x[0], is composed of two fields, 'F0' and 'F1'. The first field is a single character, the second a float. 
Fields can be accessed for each item (like x[0]['F0']) or globally (like x['F0']). Note that this syntax is analogous to getting an item.

A recarray is just a  structured ndarray with some overwritten methods, where the fields can also be accessed as attributes. Because it uses some overwritten __getattr__ and __setattr__, they tend to be not as efficient as standard structured ndarrays, but that's the price for convenience. To create a recarray, you can use the constructions functions in np.records, or simply take a view of your structured array as a np.recarray. So, when you use np.rec.fromrecords, you get a recarray, which is a subclass of structured arrays. Each item of a np.recarray is a special object (np.record), which is a regular np.void that allows attribute-like access to fields.

Masked arrays are ndarrays that have a special mask attributes. Since 1.3, masked arrays support flexible dtypes (aka structured dtype), and you can mask individual fields. If 
>>> x = ma.array([('a',1), ('b',2)], dtype=[('F0','|S1'),('F1',float)]) 
>>> x['F0'][0] = ma.masked
>>> x
masked_array(data = [(--, 1.0) ('b', 2.0)],
             mask = [(True, False) (False, False)],
       fill_value = ('N', 1e+20),
            dtype = [('F0', '|S1'), ('F1', '<f8')])

Here you have a structured masked array, where fields can be accessed like items, but not like attributes, If you need the attribute-like access, take a view as a np.ma.mreocrds.MaskedRecords.
Note that we just used the regular ma.array or ma.masked_array function to create this masked structured array. We could also have defined a structured ndarray, and then taken a view as a np.ma.MaskedArray...

Unless you have a compelling reason to use np.recarrays or np.ma.mrecords.mrecarrays (like a long-time addiction to attribute access), then stick to structured arrays (masked or not)...


More information about the NumPy-Discussion mailing list