[Numpy-discussion] Trouble With MaskedArray and Shared Masks

Alexander Michael lxander.m@gmail....
Wed Feb 27 08:34:41 CST 2008


On Tue, Feb 26, 2008 at 2:32 PM, Pierre GM <pgmdevlist@gmail.com> wrote:
> Alexander,
>  The rationale behind the current behavior is to avoid an accidental
>  propagation of the mask. Consider the following example:
>
>  >>>m = numpy.array([1,0,0,1,0], dtype=bool_)
>  >>>x = numpy.array([1,2,3,4,5])
>  >>>y = numpy.sqrt([5,4,3,2,1])
>  >>>mx = masked_array(x,mask=m)
>  >>>my = masked_array(y,mask=m)
>  >>>mx[0] = 0
>  >>>print mx,my, m
>  [0 2 3 -- 5] [-- 4 3 -- 1] [ True False False  True False]
>
>  At the creation, mx._sharedmask and my._sharedmask are both True. Setting
>  mx[0]=0 forces mx._mask to be copied, so that we don't affect the mask of my.
>
>  Now,
>  >>>m = numpy.array([1,0,0,1,0], dtype=bool_)
>  >>>x = numpy.array([1,2,3,4,5])
>  >>>y = numpy.sqrt([5,4,3,2,1])
>  >>>mx = masked_array(x,mask=m)
>  >>>my = masked_array(y,mask=m)
>  >>>mx._sharedmask = False
>  >>>mx[0] = 0
>  >>>print mx,my, m
>  [0 2 3 -- 5] [5 4 3 -- 1] [False False False  True False]
>
>  By mx._sharedmask=False, we deceived numpy.ma into thinking that it's OK to
>  update the mask of mx (that is, m), and my gets updated. Sometimes it's what
>  you want (your case for example), often it is not: I've been bitten more than
>  once before reintroducing the _sharedmask flag.
>
>  As you've observed, setting a private flag isn't a very good idea: you should
>  use the .unshare_mask() function instead, that copies the mask and set the
>  _sharedmask to False. OK, in your example, copying the mask is not needed,
>  but in more general cases, it is.
>
>  At the initialization, self._sharedmask is set to (not copy). That is, if you
>  didn't specify copy=True at the creation (the default being copy=False),
>  self._sharedmask is True. Now, I recognize it's not obvious, and perhaps we
>  could introduce yet another parameter to masked_array/array/MaskedArray,
>  share_mask, that would take a default value of True and set
>  self._sharedmask=(not copy)&share_mask

Thank you for your thorough explanation. I was providing the mask
array to the constructor in order to do my own allocating, mostly to
ensure that the MaskedArray had a dense mask that *wouldn't* be
replaced with a copy without my intentional instruction. I didn't
realize that the MaskedArray was not taking ownership of provided mask
(even though copy was False) because the implied usage for providing
the mask explicitly is to read-only alias another MaskedArray's mask.
I was working against my own goal! Now that I understand a little
better, the easiest/betst thing for me to do is change the way I
create the MaskedArray to:

>>> a = numpy.ma.MaskedArray(
...     data=numpy.zeros((4,5), dtype=float),
...     mask=True,
...     fill_value=0.0
... )

This appears to cause MaskedArray to create a dense mask which
persists (i.e. isn't replaced by a copy) for the lifetime of the
MaskedArray.

>  So: should we introduce this extra parameter ?

The propagation semantics and mechanics are definitely tricky,
especially considering that it seems that the "right behavior" is
context dependent. Are the mask propagation rules spelled out anywhere
(aside from the code! :-))? I could see some potential value to an
additional argument, but the constructor is already quite complicated
so I'm reluctant to say "Yes" outright, especially with my current
level of understanding. At the very least, perhaps the doc-string
should be amended to include the note that if a mask is provided, it
is assumed to be shared and a copy of it will be made when/if it is
modified. How does the keep_mask option play into this? I don't
understand what that one does yet.

Thanks!
Alex


More information about the Numpy-discussion mailing list