[Numpy-discussion] subclassing from np.ndarray and np.rec.recarray

Pierre GM pgmdevlist@gmail....
Mon Jul 6 13:18:52 CDT 2009


On Jul 6, 2009, at 1:12 PM, Elaine Angelino wrote:
> Hi -- We are subclassing from np.rec.recarray and are confused about  
> how some methods of np.rec.recarray relate to (differ from)  
> analogous methods of its parent, np.ndarray.  Below are specific  
> questions about the __eq__, __getitem__ and view methods, we'd  
> appreciate answers to our specific questions and/or more general  
> points that we may be not understanding about subclassing from  
> np.ndarray (and np.rec.recarray).

For generic information about subclassing, please refer to:
http://www.scipy.org/Subclasses
http://docs.scipy.org/doc/numpy/user/basics.subclassing.html

> 1) Suppose I have a recarray object, x. How come  
> np.ndarray.__getitem__(x, 'column_name') returns a recarray object  
> rather than a ndarray? e.g.,

ndarray.__getitem__(x, item) calls x.__array_finalize__ if item is a  
basestring and not an integer. __array_finalize__ outputs an array of  
the same subtype as x (here, a recarray).

> 2)a) When I use the __getitem__ method of recarray to get an  
> individual column, the returned object is an ndarray when the column  
> is a numeric type but it is a recarray when the column is a string  
> type. Why doesn't __getitem__ always return an ndarray for an  
> individual column? e.g.,
>
>
> In [175]: x = np.rec.fromrecords([(1,'dd'), (2,'cc')],  
> names=['a','b'])
>

In your example.
 >>> x.dtype
dtype([('a', '<i4'), ('b', '|S2')])

So, field 'a'  has a dtype int, which is a built-in dtype, while field  
'b' has a dtype '|S2', which is NOT a dtype.
The code of recarray.__getitem__ shows you that in the first case,  
when the dtype of the output is a built-in, the output recarray  
(x['a']) is viewed as a standard ndarray. Not the case with x['b'].  
Why ? Ask Travis O.

> 2)b)  Suppose I have a subclass of recarray, NewRecarray, that  
> attaches some new attribute, e.g. 'info'.
>
> x = NewRecarray(data, names = ['a','b'], formats = '<i4, |S2')
>
> Now say I want to use recarray's __getitem__ method to get an  
> individual column.  Then
>
> x['a'] is an ndarray
> x['b'] is a NewRecarray and x['b'].info == x.info
>
> Is this the expected / proper behavior?  Is there something wrong  
> with the way I've subclassed recarray?

No, that's expected behavior. Once again, calling getitem with a field  
name as input calls __array_finalize__ internally. __array_finalize__  
transforms the output in an array w/ the same subclass as your input:  
that's why x['b'] is a NewRecArray/
However, if the dtype of the output is builtin, it's transformed back  
to a standard ndarray: that's why x['a'] is a standard ndarray.


> ---
>
> 3)a)  If I have two recarrays with the same len and column headers,  
> the __eq__ method returns the rich comparison.  Why is the result a  
> recarray rather than an ndarray?
>
> In [162]: x = np.rec.fromrecords([(1,'dd'), (2,'cc')],  
> names=['a','b'])
> In [163]: y = np.rec.fromrecords([(1,'dd'), (2,'cc')],  
> names=['a','b'])
> In [164]: x == y
> Out[164]: rec.array([ True,  True], dtype=bool)

OK, as far as I understand, here's what's going on:
* First, we check whether the dtypes are compatible.
* Then, each field of x is compared to the corresponding field of y,  
which calls a __array_finalize__ internally, and __array_wrap__  
(because you call the 'equal' ufunc).
* Then, a __array_finalize__ is called on the output, which transforms  
it back to a recarray.


> 3)b)  Suppose I have a subclass of recarray, NewRecarray, that  
> attaches some new attribute, e.g. 'info'.
>
> x = NewRecarray(data)
> y = NewRecarray(data)
> z = x == y
>
> Then z is a NewRecarray object and z.info = x.info.
>
> Is this the expected / proper behavior?  Is there something wrong  
> with the way I've subclassed recarray?  [Dan Yamins asked this a  
> couple days ago]

To tell you whether there's something wrong, I'd need to see the code.  
I'm not especially surprised by this behavior...


> ---
>
> 4)  Suppose I have a subclass of np.ndarray, NewArray, that attaches  
> some new attribute, e.g. 'info'. When I view a NewArray object as a  
> ndarray, the result has no 'info' attribute. Is the memory  
> corresponding to the 'info' attribute garbage collected? What  
> happens to it?

It's alive!
No, seriously: when you take a view as a ndarray, you only access the  
portion of memory corresponding to the values of your ndarray and none  
of its extra info. Same thing as calling .__array__() on your object.  
So the information is still accessible, as long as the initial object  
exists
(Correct me if I'm wrong on this one...)




More information about the NumPy-Discussion mailing list