[Numpy-discussion] in the NA discussion, what can we agree on?

Pauli Virtanen pav@iki...
Fri Nov 4 13:59:48 CDT 2011

04.11.2011 17:31, Gary Strangman kirjoitti:
> The question does still remain what to do when performing operations like 
> those above in IGNORE cases. Perform the operation "underneath"? Or not?

I have a feeling that if you don't start by mathematically defining the
scalar operations first, and only after that generalize them to arrays,
some conceptual problems may follow.

On the other hand, I should note that numpy.ma does not work this way,
and many people seem still happy with how it works.

But if you go defining scalars first, as far as I see ufuncs (eg. binary
operations), and assignment are what needs to be defined. Since the idea
seems to be to use these as "masks", let's assume that each special
value can also carry a payload.


There are a two options how to behave with respect to binary/unary

(P) Propagating

unop(SPECIAL_1) == SPECIAL_new
binop(SPECIAL_1, SPECIAL_2) == SPECIAL_new
binop(a, SPECIAL) == SPECIAL_new

(N) Non-propagating

unop(SPECIAL_1) == SPECIAL_new
binop(SPECIAL_1, SPECIAL_2) == SPECIAL_new
binop(a, SPECIAL) == binop(a, binop.identity) == a


And three options on what to do on assignment:

(d) Destructive

a := SPECIAL      # -> a == SPECIAL

(n) Non-destructive

a := SPECIAL      # -> a unchanged

(s) Self-destructive

a := SPECIAL_1
# -> if `a` is SPECIAL-class, then a == SPECIAL_1,
# otherwise `a` remains unchanged


Finally, there is a question whether the value has a payload or not.

The payload complicates the scheme, as binary and unary operations need
to create new values. For singletons (eg. NaN) this is not a problem.
But if it's a non-singleton, desirable behavior would be to retain
commutativity (and other similar properties) of binary ops. I see two
sensible approaches for this: either raise an error, or do the
computation on the payload.

This brings in a third choice: (S) singleton, (E) payload, but raise
errors on operations only on special values, and (C) payload, but do
computations on payload.


For shorthand, we can refer to the above choices with the nomenclature

    <shorthand> ::= <propagation> <destructivity> <payload_type>
    <propagation> ::= "P" | "N"
    <destructivity> ::= "d" | "n" | "s"
    <payload_type> ::= "S" | "E" | "C"

That makes 2 * 3 * 3 = 18 different ways to construct consistent
behavior. Some of them might make sense, the problem is to find out which :)

NAN and NA apparently fall into the PdS class.

If classified this way, behaviour of items in np.ma arrays is different
in different operations, but seems roughly PdX, where X stands for
returning a masked value with the first argument as the payload in
binary ops if either argument is masked. This makes inline binary ops
behave like Nn. Reductions are N. (Assignment: dC, reductions: N, binary
ops: PX, unary ops: PC, inline binary ops: Nn).

Finally, there's a can of worms on specifying the outcome of binary
operations on two special values of different kinds, but it's maybe best
to first choose one that behaves sensibly by itself.


More information about the NumPy-Discussion mailing list