[Numpy-discussion] feedback request: proposal to add masks to the core ndarray

Benjamin Root ben.root@ou....
Fri Jun 24 20:57:49 CDT 2011

On Fri, Jun 24, 2011 at 8:11 PM, Nathaniel Smith <njs@pobox.com> wrote:

> On Fri, Jun 24, 2011 at 2:09 PM, Benjamin Root <ben.root@ou.edu> wrote:
> > Another example of how we use masks in matplotlib is in pcolor().  We
> have
> > to combine the possible masks of X, Y, and V in both the x and y
> directions
> > to find the final mask to use for the final output result (because each
> > facet needs valid data at each corner).  Having a soft-mask
> implementation
> > allows one to create a temporary mask to use for the operation, and to
> share
> > that mask across all the input data, but then let the data structures
> retain
> > their original masks when done.
> This is a situation where I would just... use an array and a mask,
> rather than a masked array. Then lots of things -- changing fill
> values, temporarily masking/unmasking things, etc. -- come from free,
> just from knowing how arrays and boolean indexing work?
With a masked array, it is "for free".  Why re-invent the wheel?  It has
already been done for me.

> Do we really get much advantage by building all these complex
> operations in? I worry that we're trying to anticipate and write code
> for every situation that users find themselves in, instead of just
> giving them some simple, orthogonal tools.
This is the danger, and which is why I advocate retaining the MaskedArray
type that would provide the high-level "intelligent" operations, meanwhile
having in the core the basic data structures for  pairing a mask with an
array, and to recognize a special np.NA value that would act upon the mask
rather than the underlying data.  Users would get very basic functionality,
while the MaskedArray would continue to provide the interface that we are
used to.

> As a corollary, I worry that learning and keeping track of how masked
> arrays work is more hassle than just ignoring them and writing the
> necessary code by hand as needed. Certainly I can imagine that *if the
> mask is a property of the data* then it's useful to have tools to keep
> it aligned with the data through indexing and such. But some of these
> other things are quicker to reimplement than to look up the docs for,
> and the reimplementation is easier to read, at least for me...

What you are advocating is similar to the "tried-n-true" coding practice of
Matlab users of using NaNs.  You will hear from Matlab programmers about how
it is the greatest idea since sliced bread (and I was one of them).  Then I
was introduced to Numpy, and I while I do sometimes still do the NaN
approach, I realized that the masked array is a "better" way.

(By the way, is this hard-mask/soft-mask stuff documented anywhere? I
> spent some time groveling over the numpy docs with google's help, and
> I still have no idea what it actually is.)

As for documentation, on hard/soft masks, just look at the docs for the
MaskedArray constructor:

$ pydoc numpy.ma.MaskedArray
numpy.ma.MaskedArray = class MaskedArray(numpy.ndarray)
 |  An array class with possibly masked values.
 |  Masked values of True exclude the corresponding element from any
 |  computation.
 |  Construction::
 |    x = MaskedArray(data, mask=nomask, dtype=None,
 |                    copy=False, subok=True, ndmin=0, fill_value=None,
 |                    keep_mask=True, hard_mask=None, shrink=True)


 |  keep_mask : bool, optional
 |      Whether to combine `mask` with the mask of the input data, if any
 |      (True), or to use only `mask` for the output (False). Default is
 |  hard_mask : bool, optional
 |      Whether to use a hard mask or not. With a hard mask, masked values
 |      cannot be unmasked. Default is False.
 |  shrink : bool, optional
 |      Whether to force compression of an empty mask. Default is True.

Later, there is a little bit of info on shared masks (although the docs
could be better):

 |  unshare_mask(self)
 |      Copy the mask and set the sharedmask flag to False.
 |      Whether the mask is shared between masked arrays can be seen from
 |      the `sharedmask` property. `unshare_mask` ensures the mask is not
 |      A copy of the mask is only made if it was shared.
 |      See Also
 |      --------
 |      sharedmask

I hope that helps,
Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20110624/1f673946/attachment.html 

More information about the NumPy-Discussion mailing list