[Numpy-discussion] in the NA discussion, what can we agree on?
T J
tjhnson@gmail....
Fri Nov 4 17:08:37 CDT 2011
On Fri, Nov 4, 2011 at 2:29 PM, Nathaniel Smith <njs@pobox.com> wrote:
> On Fri, Nov 4, 2011 at 1:22 PM, T J <tjhnson@gmail.com> wrote:
> > I agree that it would be ideal if the default were to skip IGNORED
> values,
> > but that behavior seems inconsistent with its propagation properties
> (such
> > as when adding arrays with IGNORED values). To illustrate, when we did
> > "x+2", we were stating that:
> >
> > IGNORED(2) + 2 == IGNORED(4)
> >
> > which means that we propagated the IGNORED value. If we were to skip
> them
> > by default, then we'd have:
> >
> > IGNORED(2) + 2 == 2
> >
> > To be consistent, then it seems we also should have had:
> >
> >>>> x + 2
> > [3, 2, 5]
> >
> > which I think we can agree is not so desirable. What this seems to come
> > down to is that we tend to want different behavior when we are doing
> > reductions, and that for IGNORED data, we want it to propagate in every
> > situation except for a reduction (where we want to skip over it).
> >
> > I don't know if there is a well-defined way to distinguish reductions
> from
> > the other operations. Would it hold for generalized ufuncs? Would it
> hold
> > for other functions which might return arrays instead of scalars?
>
> Continuing my theme of looking for consensus first... there are
> obviously a ton of ugly corners in here. But my impression is that at
> least for some simple cases, it's clear what users want:
>
> >>> a = [1, IGNORED(2), 3]
> # array-with-ignored-values + unignored scalar only affects unignored
> values
> >>> a + 2
> [3, IGNORED(2), 5]
> # reduction operations skip ignored values
> >>> np.sum(a)
> 4
>
> For example, Gary mentioned the common idiom of wanting to take an
> array and subtract off its mean, and he wants to do that while leaving
> the masked-out/ignored values unchanged. As long as the above cases
> work the way I wrote, we will have
>
> >>> np.mean(a)
> 2
> >>> a -= np.mean(a)
> >>> a
> [-1, IGNORED(2), 1]
>
> Which I'm pretty sure is the result that he wants. (Gary, is that
> right?) Also numpy.ma follows these rules, so that's some additional
> evidence that they're reasonable. (And I think part of the confusion
> between Lluís and me was that these are the rules that I meant when I
> said "non-propagating", but he understood that to mean something
> else.)
>
> So before we start exploring the whole vast space of possible ways to
> handle masked-out data, does anyone see any reason to consider rules
> that don't have, as a subset, the ones above? Do other rules have any
> use cases or user demand? (I *love* playing with clever mathematics
> and making things consistent, but there's not much point unless the
> end result is something that people will use :-).)
>
I guess I'm just confused on how one, in principle, would distinguish the
various forms of propagation that you are suggesting (ie for reductions).
I also don't think it is good that we lack commutativity. If we disallow
unignoring, then yes, I agree that what you wrote above is what people
want. But if we are allowed to unignore, then I do not.
Also, how does something like this get handled?
>>> a = [1, 2, IGNORED(3), NaN]
If I were to say, "What is the mean of 'a'?", then I think most of the time
people would want 1.5. I guess if we kept nanmean around, then we could do:
>>> a -= np.nanmean(a)
[-.5, .5, IGNORED(3), NaN]
Sorry if this is considered digging deeper than consensus. I'm just
curious if arrays having NaNs in them, in addition to IGNORED, causes
problems.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20111104/bec0e087/attachment.html
More information about the NumPy-Discussion
mailing list