[Numpy-discussion] ANN: MaskedArray as a subclass of ndarray - followup

Matt Knox mattknox_ca at hotmail.com
Thu Jan 18 12:18:22 CST 2007


> This makes sense to me.   I'm generally favorable to the new maskedarray 
> (I actually like the idea of it being a sub-class).  I'm just waiting 
> for people that actually use the MaskedArray to comment.
> 
> For 1.1 I would really like to move most of the often-used sub-classes 
> of the ndarray to the C-level and merge in functionality from CVXOPT.
> 
> -Travis
> 

I am definitely in favor of the new maskedarray implementation. I've been
working with Pierre on a time series module which is a subclass of the new
masked array implementation, and having it as a subclass of ndarray definitely
has advantages (and no real disadvantages that I am aware of).

Moving the implementation to the C-level would be awesome. In particular,
__getitem__ and __setitem__ are incredibly slow with masked arrays compared to
ndarrays, so using those inside python loops is basically a really bad idea
currently. You always have to work with the _data and _mask attributes directly
if you are concerned about performance.

Also, there is a "bug" in Pierre's current implementation I spoke with him
about, but currently have no solution for. numpy.add.accumulate doesn't work on
arrays from the new maskedarray implementation, but does with the old one. The
problem seems to arise when you over-ride __getitem__ in an ndarray sub-class.
See the code below for a demonstration:


import numpy

import numpy.core.umath as umath 
from numpy.core.numeric import ndarray
import numpy.core.numeric as numeric

class Foo1(numeric.ndarray):

    def __new__(self, data=None):
        _data = numeric.array(data)
        return numeric.asanyarray(_data).view(self)

    def __array_finalize__(self, obj):
        if not hasattr(self, "_data"):
            if hasattr(obj,'_data'):
                self._data = obj._data
            else:
                self._data = obj
  
    def __array__ (self, t=None, context=None):
        return self._data
        
    def __array_wrap__(self, array, context=None):
        return Foo1(array)

    """
    if you define this to return something other than what standard ndarray
    returns, accumulate doesn't work"""
    def __getitem__(self, index):
        return self._data[index]
        #return super(Foo1, self).__getitem__(index)
        
    
class Foo2(object):

    def __init__(self, data=None):
        self._data = numeric.array(data)

    def __array__ (self, t=None, context=None):
        return self._data
        
    def __array_wrap__(self, array, context=None):
        return Foo2(array)
    
    def __getitem__(self, index):
        return self._data[index]

    def __str__(self):
        return str(self._data)

    def __add__(self, other):
        return umath.add(self._data, other._data)
    

if __name__ == "__main__":

    from numpy import add
    ac = add.accumulate

    foo1 = Foo1([1,2,3,4])
    foo2 = Foo2([1,2,3,4])
    
    print ac(foo1), ac(foo2)
    
    



More information about the Numpy-discussion mailing list