[Numpy-discussion] missing data: semantics

Matthew Brett matthew.brett@gmail....
Thu Jun 30 12:51:42 CDT 2011


On Thu, Jun 30, 2011 at 6:46 PM, Lluís <xscript@gmx.net> wrote:
> Ok, I think it's time to step back and reformulate the problem by
> completely ignoring the implementation.
> Here we have 2 "generic" concepts (i.e., applicable to R), plus another
> extra concept that is exclusive to numpy:
> * Assigning np.NA to an array, cannot be undone unless through explicit
>  assignment (i.e., assigning a new arbitrary value, or saving a copy of
>  the original array before assigning np.NA).
> * np.NA values propagate by default, unless ufuncs have the "skipna =
>  True" argument (or the other way around, it doesn't really matter to
>  this discussion). In order to avoid passing the argument on each
>  ufunc, we either have some per-array variable for the default "skipna"
>  value (undesirable) or we can make a trivial ndarray subclass that
>  will set the "skipna" argument on all ufuncs through the
>  "_ufunc_wrapper_" mechanism.
> Now, numpy has the concept of views, which adds some more goodies to the
> list of concepts:
> * With views, two arrays can share the same physical data, so that
>  assignments to any of them will be seen by others (including NA
>  values).
> The creation of a view is explicitly stated by the user, so its
> behaviour should not be perceived as odd (after all, you asked for a
> view).
> The good thing is that with views you can avoid costly array copies if
> you're careful when writing into these views.
> Now, you can add a new concept: local/temporal/transient missing data.
> We can take an existing array and create a view with the new argument
> "transientna = True".
> Here, both the view and the "transientna = True" are explicitly stated
> by the user, so it is assumed that she already knows what this is all
> about.
> The difference with a regular view is that you also explicitly asked for
> local/temporal/transient NA values.
> * Assigning np.NA to an array view with "transientna = True" will
>  *not* be seen by any of the other views (nor the "original" array),
>  but anything else will still work "as usual".
> After all, this is what *you* asked for when using the "transientna =
> True" argument.
> To conclude, say that others *must not* care about whether the arrays
> they're working with have transient NA values. This way, I can create a
> view with transient NAs, set to NA some uninteresting data, and pass it
> to a routine written by someone else that sets to NA elements that, for
> example, are beyond certain threshold from the mean of the elements.
> This would be equivalent to storing a copy of the original array before
> passing it to this 3rd party function, only that "transientna", just as
> views, provide some handy shortcuts to avoid copies.
> My main point here is that views and local/temporal/transient NAs are
> all *explicitly* requested, so that its behaviour should not appear as
> something unexpected.
> Is there an agreement on this?

Absolutely, if by 'transientna' you mean 'masked'.  The discussion is
whether the NA API should be the same as the masking API.   The thing
you are describing is what masking is for, and what it's always been
for, as far as I can see.   We're arguing that to call this
'transientna' instead of 'masked' confuses two concepts that are
different, to no good purpose.



More information about the NumPy-Discussion mailing list