[Numpy-discussion] feedback request: proposal to add masks to the core ndarray
Sat Jun 25 14:44:42 CDT 2011
On Fri, Jun 24, 2011 at 10:59 PM, Nathaniel Smith <email@example.com> wrote:
> On Fri, Jun 24, 2011 at 6:57 PM, Benjamin Root <firstname.lastname@example.org> wrote:
> > On Fri, Jun 24, 2011 at 8:11 PM, Nathaniel Smith <email@example.com> wrote:
> >> This is a situation where I would just... use an array and a mask,
> >> rather than a masked array. Then lots of things -- changing fill
> >> values, temporarily masking/unmasking things, etc. -- come from free,
> >> just from knowing how arrays and boolean indexing work?
> > With a masked array, it is "for free". Why re-invent the wheel? It has
> > already been done for me.
> But it's not for free at all. It's an additional concept that has to
> be maintained, documented, and learned (with the last cost, which is
> multiplied by the number of users, being by far the greatest). It's
> not reinventing the wheel, it's saying hey, I have wheels and axles,
> but what I really need the library to provide is a wheel+axle
It feels like you're suggesting the NA bit pattern vs mask distinction and
the programming interface users of NumPy see are closely tied together. This
isn't the case at all, and I would like more feedback on the interface side
of things irrespective of the implementation details. Please tell me what
your wheel+axle assembly looks like.
>> Do we really get much advantage by building all these complex
> >> operations in? I worry that we're trying to anticipate and write code
> >> for every situation that users find themselves in, instead of just
> >> giving them some simple, orthogonal tools.
> > This is the danger, and which is why I advocate retaining the MaskedArray
> > type that would provide the high-level "intelligent" operations,
> > having in the core the basic data structures for pairing a mask with an
> > array, and to recognize a special np.NA value that would act upon the
> > rather than the underlying data. Users would get very basic
> > while the MaskedArray would continue to provide the interface that we are
> > used to.
> The interface as described is quite different... in particular, all
> aggregate operations would change their behavior.
Which operations are changing, and what is the difference in behavior? I
don't recall proposing something like this. My initial proposal had a
difference with R for the aggregate operations, but I've changed the NEP
based on your feedback.
>> As a corollary, I worry that learning and keeping track of how masked
> >> arrays work is more hassle than just ignoring them and writing the
> >> necessary code by hand as needed. Certainly I can imagine that *if the
> >> mask is a property of the data* then it's useful to have tools to keep
> >> it aligned with the data through indexing and such. But some of these
> >> other things are quicker to reimplement than to look up the docs for,
> >> and the reimplementation is easier to read, at least for me...
> > What you are advocating is similar to the "tried-n-true" coding practice
> > Matlab users of using NaNs. You will hear from Matlab programmers about
> > it is the greatest idea since sliced bread (and I was one of them). Then
> > was introduced to Numpy, and I while I do sometimes still do the NaN
> > approach, I realized that the masked array is a "better" way.
> Hey, no need to go around calling people Matlab programmers, you might
> hurt someone's feelings.
> But seriously, my argument is that every abstraction and new concept
> has a cost, and I'm dubious that the full masked array abstraction
> carries its weight and justifies this cost, because it's highly
> redundant with existing abstractions. That has nothing to do with how
> tried-and-true anything is.
The abstraction is R-like missing values, and two implementation mechanisms
are NA bit patterns and masks. There is no "full masked array abstraction"
as a component end users will have to learn.
> As for documentation, on hard/soft masks, just look at the docs for the
> > MaskedArray constructor:
> -- Nathaniel
> NumPy-Discussion mailing list
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion