[Numpy-discussion] feedback request: proposal to add masks to the core ndarray

Mark Wiebe mwwiebe@gmail....
Sat Jun 25 14:24:08 CDT 2011

On Fri, Jun 24, 2011 at 8:11 PM, Nathaniel Smith <njs@pobox.com> wrote:

> On Fri, Jun 24, 2011 at 2:09 PM, Benjamin Root <ben.root@ou.edu> wrote:
> > Another example of how we use masks in matplotlib is in pcolor().  We
> have
> > to combine the possible masks of X, Y, and V in both the x and y
> directions
> > to find the final mask to use for the final output result (because each
> > facet needs valid data at each corner).  Having a soft-mask
> implementation
> > allows one to create a temporary mask to use for the operation, and to
> share
> > that mask across all the input data, but then let the data structures
> retain
> > their original masks when done.
> This is a situation where I would just... use an array and a mask,
> rather than a masked array. Then lots of things -- changing fill
> values, temporarily masking/unmasking things, etc. -- come from free,
> just from knowing how arrays and boolean indexing work?

For free, in sort of the same way that a C pointer, a dtype, a shape, and an
array of strides gives you the NumPy ndarray for free. That example is a bit
more extreme, but the same idea applies.

> Do we really get much advantage by building all these complex
> operations in? I worry that we're trying to anticipate and write code
> for every situation that users find themselves in, instead of just
> giving them some simple, orthogonal tools.

Simple orthogonal tools is what I'm aiming for. That's driven by the myriad
use cases, so it's important to understand all of them and to allow them to
hone the design.

As a corollary, I worry that learning and keeping track of how masked
> arrays work is more hassle than just ignoring them and writing the
> necessary code by hand as needed. Certainly I can imagine that *if the
> mask is a property of the data* then it's useful to have tools to keep
> it aligned with the data through indexing and such. But some of these
> other things are quicker to reimplement than to look up the docs for,
> and the reimplementation is easier to read, at least for me...

This is where designing the interface NumPy exposes comes in. Whether NA
values are represented with a separate mask or with a special bit pattern,
the missing value API should be basically the same, because it's for the
users of the system, the representation is an implementation detail. It's
still important, of course, but a separate issue which is being treated as
the same in many comments in this thread.


(By the way, is this hard-mask/soft-mask stuff documented anywhere? I
> spent some time groveling over the numpy docs with google's help, and
> I still have no idea what it actually is.)
> -- Nathaniel
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20110625/c19b5e7d/attachment-0001.html 

More information about the NumPy-Discussion mailing list