[Numpy-discussion] in the NA discussion, what can we agree on?
Fri Nov 4 19:37:40 CDT 2011
04.11.2011 22:29, Nathaniel Smith kirjoitti:
> Continuing my theme of looking for consensus first... there are
> obviously a ton of ugly corners in here. But my impression is that at
> least for some simple cases, it's clear what users want:
>>>> a = [1, IGNORED(2), 3]
> # array-with-ignored-values + unignored scalar only affects unignored values
>>>> a + 2
> [3, IGNORED(2), 5]
> # reduction operations skip ignored values
This can break commutativity:
>>> a = [1, IGNORED(2), 3]
>>> b = [4, IGNORED(5), 6]
>>> x = a + b
>>> y = b + a
>>> x = ???
>>> y = ???
unop(IGNORED(a)) == IGNORED(a)
binop(IGNORED(a), b) == IGNORED(a)
binop(a, IGNORED(b)) == IGNORED(b)
binop(IGNORED(a), IGNORED(b)) == IGNORED(binop(a, b)) # or NA
could however get around that. That seems to be pretty much how NaN
works, except that it now carries a "hidden" value with it.
> For example, Gary mentioned the common idiom of wanting to take an
> array and subtract off its mean, and he wants to do that while leaving
> the masked-out/ignored values unchanged.As long as the above cases
> work the way I wrote, we will have
>>>> a -= np.mean(a)
> [-1, IGNORED(2), 1]
That would be propagating + the above NaN-like rules for binary operators.
Whether the reduction methods have skip_IGNORE=True as default or not is
in my opinion more of an API question, rather than a question on how the
algebra of ignored values should work.
If destructive assignment is really needed to avoid problems with
commutation, [see T. J. (2011)] is then maybe a problem. So, one would
need to have
>>> x = [1, IGNORED(2), 3]
>>> y = [1, IGNORED(2), 3]
>>> z = [4, IGNORED(5), IGNORED(6)]
>>> x[:] = z
[4, IGNORED(5), IGNORED(6)]
>>> y += z
[4, IGNORED(7), IGNORED(6)]
This is not how np.ma works. But if you do otherwise, there doesn't seem
to be any guarantee that
>>> a += 42
>>> a += b
is the same thing as
>>> a += b
>>> a += 42
> So before we start exploring the whole vast space of possible ways to
> handle masked-out data, does anyone see any reason to consider rules
> that don't have, as a subset, the ones above? Do other rules have any
> use cases or user demand? (I *love* playing with clever mathematics
> and making things consistent, but there's not much point unless the
> end result is something that people will use :-).)
Yep, it's important to keep in mind what people want.
People however tend to implicitly expect that simple arithmetic
operations on arrays, containing ignored values or not, operate in a
certain way. Actually stating how these operations work with scalars
gives valuable insight on how you'd like things to work.
Also, if you propose to break the rules of arithmetic, in a fundamental
library meant for scientific computation, you should be aware that you
do so, and how you do so.
I mean, at least for me it was not clear before this formulation that
there was a reason why binary ops in np.ma were not commutative! Now I
kind of see that there is an asymmetry in assignment into masked arrays,
and there is a conflict with commuting operations and with "what you'd
expect ignored values to do". I'm not sure if it's possible to get rid
of this problem, but it could be possible to restrict it to assignments
and in-place operations rather than having it in binary ops.
More information about the NumPy-Discussion