[Numpy-discussion] in the NA discussion, what can we agree on?

T J tjhnson@gmail....
Sat Nov 5 17:22:27 CDT 2011


On Sat, Nov 5, 2011 at 12:55 AM, Nathaniel Smith <njs@pobox.com> wrote:

> On Fri, Nov 4, 2011 at 8:33 PM, T J <tjhnson@gmail.com> wrote:
> > On Fri, Nov 4, 2011 at 8:03 PM, Nathaniel Smith <njs@pobox.com> wrote:
> >> Again, I really don't think you're going to be able to sell an API where
> >>  [2] + [IGNORED(20)] == [IGNORED(22)]
> >> I mean, it's not me you have to convince, it's Gary, Pierre, maybe
> >> Benjamin, Lluís, etc. So I could be wrong. But you might want to
> >> figure that out first before making plans based on this...
> >
> > But this is how np.ma currently does it, except that it doesn't compute
> the
> > payload---it just calls it IGNORED.
>
> Yes, that's what I mean -- if you're just temporarily masking
> something out because you want it to be IGNORED, then you don't want
> it to change around when you do something like a += 2, right? If the
> operation is changing the payload, then it's weird to say that the
> operation ignored the payload...
>

That's a fair critique.


>
> Anyway, I think this is another way to think about your suggestion:
>
> -- each array gets an extra boolean array called the "mask" that it
> carries around with it
> -- Unary ufuncs automatically copy these masks to their results. For
> binary ufuncs, the input masks get automatically ORed together, and
> that determines the mask attached to the output array
> -- these masks have absolutely no effect on any computations, except that
>     ufunc.reduce(a, skip_IGNORED=True)
> is defined to be a synonym for
>     ufunc.reduce(a, where=a.mask)
>
> Is that correct?
>


I believe that is correct.


>
> Also, if can I ask -- is this something you would find useful yourself?
>

So I guess this goes back to finding some consensus on what people want out
of IGNORED values.  With a very naive look at the initial list you
provided, it seems that this particular suggestion matches it, and provides
a fairly consistent behavior across operations (commutativity and
unmasking).

However, it doesn't seem to match an unstated expectation you had: which is
that ignored values should truly be ignored (and payloads should not be
operated on, etc).  It seems (see Pauli's email too) that we might have to
give up commutativity to achieve that.  Maybe that is okay.  The suggestion
I put forth seems to treat "ignored" more as just another notion of the
"where" keyword, as you pointed out.  It is not so much a statement that
the ignored values should be ignored during computations, just that they
should be ignored when we query the valid elements in the array.  So it
works if you just want to plot certain portions of an array and possibly do
calculations on them.  But if you want to "double all integers greater than
3 and quadruple all integers less than 3", then this notion of IGNORED will
not work as easily.  Though this could easily be handled without IGNORED
values too:  x[x>3] *= 2.

So what do people expect out of ignored values?  It seems that we might
need to extend the list you put forward so that it includes these desires.
Since my primary use is with MISSING and not so much IGNORED, I'm not in a
very good position to help extend that list.  I'd be curious to know if
this present suggestion would work with how matplotlib uses masked arrays.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20111105/9f38ef36/attachment-0001.html 


More information about the NumPy-Discussion mailing list