[Numpy-discussion] missing data: semantics
Charles R Harris
Thu Jun 30 13:13:59 CDT 2011
On Thu, Jun 30, 2011 at 11:51 AM, Matthew Brett <firstname.lastname@example.org>wrote:
> On Thu, Jun 30, 2011 at 6:46 PM, Lluís <email@example.com> wrote:
> > Ok, I think it's time to step back and reformulate the problem by
> > completely ignoring the implementation.
> > Here we have 2 "generic" concepts (i.e., applicable to R), plus another
> > extra concept that is exclusive to numpy:
> > * Assigning np.NA to an array, cannot be undone unless through explicit
> > assignment (i.e., assigning a new arbitrary value, or saving a copy of
> > the original array before assigning np.NA).
> > * np.NA values propagate by default, unless ufuncs have the "skipna =
> > True" argument (or the other way around, it doesn't really matter to
> > this discussion). In order to avoid passing the argument on each
> > ufunc, we either have some per-array variable for the default "skipna"
> > value (undesirable) or we can make a trivial ndarray subclass that
> > will set the "skipna" argument on all ufuncs through the
> > "_ufunc_wrapper_" mechanism.
> > Now, numpy has the concept of views, which adds some more goodies to the
> > list of concepts:
> > * With views, two arrays can share the same physical data, so that
> > assignments to any of them will be seen by others (including NA
> > values).
> > The creation of a view is explicitly stated by the user, so its
> > behaviour should not be perceived as odd (after all, you asked for a
> > view).
> > The good thing is that with views you can avoid costly array copies if
> > you're careful when writing into these views.
> > Now, you can add a new concept: local/temporal/transient missing data.
> > We can take an existing array and create a view with the new argument
> > "transientna = True".
> > Here, both the view and the "transientna = True" are explicitly stated
> > by the user, so it is assumed that she already knows what this is all
> > about.
> > The difference with a regular view is that you also explicitly asked for
> > local/temporal/transient NA values.
> > * Assigning np.NA to an array view with "transientna = True" will
> > *not* be seen by any of the other views (nor the "original" array),
> > but anything else will still work "as usual".
> > After all, this is what *you* asked for when using the "transientna =
> > True" argument.
> > To conclude, say that others *must not* care about whether the arrays
> > they're working with have transient NA values. This way, I can create a
> > view with transient NAs, set to NA some uninteresting data, and pass it
> > to a routine written by someone else that sets to NA elements that, for
> > example, are beyond certain threshold from the mean of the elements.
> > This would be equivalent to storing a copy of the original array before
> > passing it to this 3rd party function, only that "transientna", just as
> > views, provide some handy shortcuts to avoid copies.
> > My main point here is that views and local/temporal/transient NAs are
> > all *explicitly* requested, so that its behaviour should not appear as
> > something unexpected.
> > Is there an agreement on this?
> Absolutely, if by 'transientna' you mean 'masked'. The discussion is
> whether the NA API should be the same as the masking API. The thing
> you are describing is what masking is for, and what it's always been
> for, as far as I can see. We're arguing that to call this
> 'transientna' instead of 'masked' confuses two concepts that are
> different, to no good purpose.
It's a hammer. If you want to hammer nails, fine, if you want hammer a bit
of tubing flat, fine. It's a tool, the hammer concept if you will.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion