[Numpy-discussion] missing data: semantics
Charles R Harris
Thu Jun 30 13:11:52 CDT 2011
On Thu, Jun 30, 2011 at 11:46 AM, Lluís <firstname.lastname@example.org> wrote:
> Ok, I think it's time to step back and reformulate the problem by
> completely ignoring the implementation.
> Here we have 2 "generic" concepts (i.e., applicable to R), plus another
> extra concept that is exclusive to numpy:
> * Assigning np.NA to an array, cannot be undone unless through explicit
> assignment (i.e., assigning a new arbitrary value, or saving a copy of
> the original array before assigning np.NA).
> * np.NA values propagate by default, unless ufuncs have the "skipna =
> True" argument (or the other way around, it doesn't really matter to
> this discussion). In order to avoid passing the argument on each
> ufunc, we either have some per-array variable for the default "skipna"
> value (undesirable) or we can make a trivial ndarray subclass that
> will set the "skipna" argument on all ufuncs through the
> "_ufunc_wrapper_" mechanism.
> Now, numpy has the concept of views, which adds some more goodies to the
> list of concepts:
> * With views, two arrays can share the same physical data, so that
> assignments to any of them will be seen by others (including NA
> The creation of a view is explicitly stated by the user, so its
> behaviour should not be perceived as odd (after all, you asked for a
> The good thing is that with views you can avoid costly array copies if
> you're careful when writing into these views.
> Now, you can add a new concept: local/temporal/transient missing data.
> We can take an existing array and create a view with the new argument
> "transientna = True".
This is already there: x.view(masked=1), although the keyword transientna
has appeal, not least because it avoids the word 'mask', which seems a
source of endless confusion. Note that currently this is only supposed to
work if the original array is unmasked.
Here, both the view and the "transientna = True" are explicitly stated
> by the user, so it is assumed that she already knows what this is all
> The difference with a regular view is that you also explicitly asked for
> local/temporal/transient NA values.
> * Assigning np.NA to an array view with "transientna = True" will
> *not* be seen by any of the other views (nor the "original" array),
> but anything else will still work "as usual".
> After all, this is what *you* asked for when using the "transientna =
> True" argument.
> To conclude, say that others *must not* care about whether the arrays
> they're working with have transient NA values. This way, I can create a
> view with transient NAs, set to NA some uninteresting data, and pass it
> to a routine written by someone else that sets to NA elements that, for
> example, are beyond certain threshold from the mean of the elements.
> This would be equivalent to storing a copy of the original array before
> passing it to this 3rd party function, only that "transientna", just as
> views, provide some handy shortcuts to avoid copies.
> My main point here is that views and local/temporal/transient NAs are
> all *explicitly* requested, so that its behaviour should not appear as
> something unexpected.
> Is there an agreement on this?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion