[Numpy-discussion] A reimplementation of MaskedArray

Michael Sorich michael.sorich at gmail.com
Tue Nov 21 23:25:43 CST 2006


Perhaps an example will help explain what I mean

For the case of an ndarray if you select a row and then alter the new
array, the old array
is also changed.

from numpy import *
a = array([[1,2,3,4,5],[1,2,3,4,5],[1,2,3,4,5]])
suba = a[2]
suba[1] = 10
print a
print suba
--output--
[[ 1  2  3  4  5]
 [ 1  2  3  4  5]
 [ 1 10  3  4  5]]
[ 1 10  3  4  5]

In the current version of maskedarray in numpy, changes in the row
array affect the parent array. Here whenever you select a single row
or column and mask is nomask a ndarray is returned not a masked array

from numpy.core.ma import *
a = array([[1,2,3,4,5],[1,2,3,4,5],[1,2,3,4,5]], mask=nomask)
suba = a[2]
suba[1] = 10
print a
print suba
print type(suba)
--output--
[[1 2 3 4 5]
 [1 2 3 4 5]
 [1 10 3 4 5]]
[ 1 10  3  4  5]
<type 'numpy.ndarray'>

However if mask is anything other than nomask (even if the mask is an
boolean array in which all the values are false- which means the same
thing as nomask) a masked array is returned. Once again the data is
shared between the arrays

from numpy.core.ma import *
a = array([[1,2,3,4,5],[1,2,3,4,5],[1,2,3,4,5]],
mask=[[0,0,0,0,0],[0,0,0,0,0],[0,0,0,0,0]])
suba = a[2]
suba[1] = 10
print a
print suba
print type(suba)
--output--
[[1 2 3 4 5]
 [1 2 3 4 5]
 [1 10 3 4 5]]
[1 10 3 4 5]
<class 'numpy.core.ma.MaskedArray'>

Unfortunately if the value is changed to masked, this is not updated
in the parent array. This seems very inconsistent. I don't view masked
values any different than any other value.

a = array([[1,2,3,4,5],[1,2,3,4,5],[1,2,3,4,5]],
mask=[[0,0,0,0,0],[0,0,0,0,0],[0,0,0,0,0]])
suba = a[2]
suba[1] = masked
print a
print suba
print type(suba)
--output--
[[1 2 3 4 5]
 [1 2 3 4 5]
 [1 2 3 4 5]]
[1 -- 3 4 5]
<class 'numpy.core.ma.MaskedArray'>

With the new implementation, the data is not shared for any of the 3
variations above. I am happy that the arrays acts consistently,
however this is different from the ndarray. It would be nice if the
behavior was the same as the ndarray, but it is better than the
numpy.core.ma implementation. If this is on purpose, then it should be
documented.

from numpyext.maskedarray import *
a = array([[1,2,3,4,5],[1,2,3,4,5],[1,2,3,4,5]], mask=nomask)
suba = a[2]
suba[1] = 10
print a
print suba
print type(suba)
--output--
[[1 2 3 4 5]
 [1 2 3 4 5]
 [1 2 3 4 5]]
[ 1 10  3  4  5]
<class 'numpyext.maskedarray.MaskedArray'>


On 11/22/06, Pierre GM <pgmdevlist at gmail.com> wrote:
>
>
> On Tuesday 21 November 2006 21:11, Michael Sorich wrote:
>
> > I think that the new implementation is making a copy of the data with
>
> > indexing a MA. This is different from both ndarray and the existing
>
> > numpy ma version.
>
>
>
> Michael,
>
> If you check the definition of MaskedArray.__new__, you'll see that the
> "copy" argument is set to True by default. Setting it to "false" seems to
> give what you expect. Should I make the default ?
>
>
>
> > Having subviews of the mask seems complicated with the mask being
>
> > nomask.
>
>
>
> Why ? nomask is just a trick to avoid unnecessary computations on a mask
> full of False that doesn't need updating.
>
>
>
> > What happens if the view sets a new masked value and hence
>
> > changes from nomask to an boolean array ?
>
> > How does the parent mask get updated?
>
>
>
> Both implementations work the same way: the parent mask is not updated.
>
>
>
> > I think the numpy implementation gets away with this by
>
> > returning a view of only the _data part if the ma mask is nomask
>
>
>
> By numpy implementation, you mean numpy.core.ma, right ?
>
> If so, then yes:
>
> `self.__getitem__[i]` returns `self._data[i]` if the mask is nomask.
>
>
>
> In maskedarray, if the mask is nomask, then
>
> `self.__getitem__[i]` returns `self._data[i]` only if
> `self._data[i].size==1`, else it returns a masked array.
>
>
>
> > I don't like this solution as I would expect a ma to be returned. Also I
>
> > suspect that if the ma is to be a view of another ma, then in __new__
>
> > a mask that is a boolean array of all False cannot be converted to
>
> > nomask.
>
>
>
> I'm not following you here: there's no `__new__` in numpy.core.ma (that's
> one of the reason why a masked array in numpy.core.ma is basically different
> from a ndarray...). And in maskedarray, a mask as array of `False` is set to
> `nomask` by default, but you can use the `flag` option: please check the
> documentation of `maskedarray.masked_array`: flag=True converts the mask,
> flags=False keeps an array of boolean.
>
>
>
>
>
> One thing to remember is that masks tend to be copied more often than not.
> And I don't think it's advisable to modify the mask of the parent: it's no
> longer the same object, as the mask is now different ! In other terms, you
> could share data, you shouldn't share a mask. And I keep getting bitten with
> data sharing, that's why I had set the 'copy' flag to True by default.
>
>
>
>
>
> > I like the new implementation of maskedarray, especially the focus on
>
> > simplicity. The only simple solution I see is to have the mask be a
>
> > boolean array at all times....
>
>
>
> You haven't convinced me yet of why a mask of False is better than `nomask`.
>
> What don't you like in maskedarray (aka the new implementation) ?
>
>


More information about the Numpy-discussion mailing list