# [Numpy-discussion] A reimplementation of MaskedArray

Michael Sorich michael.sorich at gmail.com
Wed Nov 22 00:12:11 CST 2006

```If I make some minor changes (below) to MaskedArray get and setitem

suba = a[0]
print a
>[[1 -- 3 4 5]
> [6 7 8 9 10]]
print suba
>[1 -- 3 4 5]

suba = a[1]
suba[1] = 10
print a
>[[1 -- 3 4 5]
> [6 10 8 9 10]]
print suba
>[6 10 8 9 10]

suba = a[0]
print a
>[[1 -- 3 4 5]
> [6 -- 8 9 10]]
print suba
>[1 -- 3 4 5]

suba = a[1]
suba[1] = 10
print a
>[[1 -- 3 4 5]
> [6 10 8 9 10]]
print suba
>[6 10 8 9 10]

def __getitem__(self, i):
"""x.__getitem__(y) <==> x[y]
Returns the item described by i. Not a copy as in previous versions.
"""
dout = self._data[i]
if numeric.size(dout)==1:
return dout
else:
fill_value=self._fill_value, copy=False, flag=False)
### -------------
#....
mi = m[i]
if mi.size == 1:
if mi:
else:
return dout
else:
fill_value=self._fill_value, copy=False, flag=False)

def __setitem__(self, index, value):
"""x.__setitem__(i, y) <==> x[i]=y
Sets item described by index. If value is masked, masks those locations.
"""
d = self._data
raise MAError, 'Cannot alter the masked element.'
#....
else:
#===============================================================================
#                #why does the mask need to be copied?
#===============================================================================
return
#....
value = filled(value).astype(d.dtype)
d[index] = value
#===============================================================================
#                #why does the mask need to be copied?
#===============================================================================
else:
else:
else:
#===============================================================================
#                #why does the mask need to be copied?
#===============================================================================

On 11/22/06, Michael Sorich <michael.sorich at gmail.com> wrote:
> Perhaps an example will help explain what I mean
>
> For the case of an ndarray if you select a row and then alter the new
> array, the old array
> is also changed.
>
> from numpy import *
> a = array([[1,2,3,4,5],[1,2,3,4,5],[1,2,3,4,5]])
> suba = a[2]
> suba[1] = 10
> print a
> print suba
> --output--
> [[ 1  2  3  4  5]
>  [ 1  2  3  4  5]
>  [ 1 10  3  4  5]]
> [ 1 10  3  4  5]
>
> In the current version of maskedarray in numpy, changes in the row
> array affect the parent array. Here whenever you select a single row
>
> from numpy.core.ma import *
> suba = a[2]
> suba[1] = 10
> print a
> print suba
> print type(suba)
> --output--
> [[1 2 3 4 5]
>  [1 2 3 4 5]
>  [1 10 3 4 5]]
> [ 1 10  3  4  5]
> <type 'numpy.ndarray'>
>
> boolean array in which all the values are false- which means the same
> thing as nomask) a masked array is returned. Once again the data is
> shared between the arrays
>
> from numpy.core.ma import *
> a = array([[1,2,3,4,5],[1,2,3,4,5],[1,2,3,4,5]],
> suba = a[2]
> suba[1] = 10
> print a
> print suba
> print type(suba)
> --output--
> [[1 2 3 4 5]
>  [1 2 3 4 5]
>  [1 10 3 4 5]]
> [1 10 3 4 5]
>
> Unfortunately if the value is changed to masked, this is not updated
> in the parent array. This seems very inconsistent. I don't view masked
> values any different than any other value.
>
> a = array([[1,2,3,4,5],[1,2,3,4,5],[1,2,3,4,5]],
> suba = a[2]
> print a
> print suba
> print type(suba)
> --output--
> [[1 2 3 4 5]
>  [1 2 3 4 5]
>  [1 2 3 4 5]]
> [1 -- 3 4 5]
>
> With the new implementation, the data is not shared for any of the 3
> variations above. I am happy that the arrays acts consistently,
> however this is different from the ndarray. It would be nice if the
> behavior was the same as the ndarray, but it is better than the
> numpy.core.ma implementation. If this is on purpose, then it should be
> documented.
>
> suba = a[2]
> suba[1] = 10
> print a
> print suba
> print type(suba)
> --output--
> [[1 2 3 4 5]
>  [1 2 3 4 5]
>  [1 2 3 4 5]]
> [ 1 10  3  4  5]
>
>
> On 11/22/06, Pierre GM <pgmdevlist at gmail.com> wrote:
> >
> >
> > On Tuesday 21 November 2006 21:11, Michael Sorich wrote:
> >
> > > I think that the new implementation is making a copy of the data with
> >
> > > indexing a MA. This is different from both ndarray and the existing
> >
> > > numpy ma version.
> >
> >
> >
> > Michael,
> >
> > If you check the definition of MaskedArray.__new__, you'll see that the
> > "copy" argument is set to True by default. Setting it to "false" seems to
> > give what you expect. Should I make the default ?
> >
> >
> >
> > > Having subviews of the mask seems complicated with the mask being
> >
> >
> >
> >
> > Why ? nomask is just a trick to avoid unnecessary computations on a mask
> > full of False that doesn't need updating.
> >
> >
> >
> > > What happens if the view sets a new masked value and hence
> >
> > > changes from nomask to an boolean array ?
> >
> > > How does the parent mask get updated?
> >
> >
> >
> > Both implementations work the same way: the parent mask is not updated.
> >
> >
> >
> > > I think the numpy implementation gets away with this by
> >
> > > returning a view of only the _data part if the ma mask is nomask
> >
> >
> >
> > By numpy implementation, you mean numpy.core.ma, right ?
> >
> > If so, then yes:
> >
> >
> >
> >
> >
> > `self.__getitem__[i]` returns `self._data[i]` only if
> > `self._data[i].size==1`, else it returns a masked array.
> >
> >
> >
> > > I don't like this solution as I would expect a ma to be returned. Also I
> >
> > > suspect that if the ma is to be a view of another ma, then in __new__
> >
> > > a mask that is a boolean array of all False cannot be converted to
> >
> >
> >
> >
> > I'm not following you here: there's no `__new__` in numpy.core.ma (that's
> > one of the reason why a masked array in numpy.core.ma is basically different
> > from a ndarray...). And in maskedarray, a mask as array of `False` is set to
> > `nomask` by default, but you can use the `flag` option: please check the
> > flags=False keeps an array of boolean.
> >
> >
> >
> >
> >
> > One thing to remember is that masks tend to be copied more often than not.
> > And I don't think it's advisable to modify the mask of the parent: it's no
> > longer the same object, as the mask is now different ! In other terms, you
> > could share data, you shouldn't share a mask. And I keep getting bitten with
> > data sharing, that's why I had set the 'copy' flag to True by default.
> >
> >
> >
> >
> >
> > > I like the new implementation of maskedarray, especially the focus on
> >
> > > simplicity. The only simple solution I see is to have the mask be a
> >
> > > boolean array at all times....
> >
> >
> >
> > You haven't convinced me yet of why a mask of False is better than `nomask`.
> >
> > What don't you like in maskedarray (aka the new implementation) ?
> >
> >
>
```