[Numpy-discussion] missing data: semantics

Charles R Harris charlesr.harris@gmail....
Thu Jun 30 13:13:59 CDT 2011


On Thu, Jun 30, 2011 at 11:51 AM, Matthew Brett <matthew.brett@gmail.com>wrote:

> Hi,
>
> On Thu, Jun 30, 2011 at 6:46 PM, Lluís <xscript@gmx.net> wrote:
> > Ok, I think it's time to step back and reformulate the problem by
> > completely ignoring the implementation.
> >
> > Here we have 2 "generic" concepts (i.e., applicable to R), plus another
> > extra concept that is exclusive to numpy:
> >
> > * Assigning np.NA to an array, cannot be undone unless through explicit
> >  assignment (i.e., assigning a new arbitrary value, or saving a copy of
> >  the original array before assigning np.NA).
> >
> > * np.NA values propagate by default, unless ufuncs have the "skipna =
> >  True" argument (or the other way around, it doesn't really matter to
> >  this discussion). In order to avoid passing the argument on each
> >  ufunc, we either have some per-array variable for the default "skipna"
> >  value (undesirable) or we can make a trivial ndarray subclass that
> >  will set the "skipna" argument on all ufuncs through the
> >  "_ufunc_wrapper_" mechanism.
> >
> >
> >
> > Now, numpy has the concept of views, which adds some more goodies to the
> > list of concepts:
> >
> > * With views, two arrays can share the same physical data, so that
> >  assignments to any of them will be seen by others (including NA
> >  values).
> >
> > The creation of a view is explicitly stated by the user, so its
> > behaviour should not be perceived as odd (after all, you asked for a
> > view).
> >
> > The good thing is that with views you can avoid costly array copies if
> > you're careful when writing into these views.
> >
> >
> >
> > Now, you can add a new concept: local/temporal/transient missing data.
> >
> > We can take an existing array and create a view with the new argument
> > "transientna = True".
> >
> > Here, both the view and the "transientna = True" are explicitly stated
> > by the user, so it is assumed that she already knows what this is all
> > about.
> >
> > The difference with a regular view is that you also explicitly asked for
> > local/temporal/transient NA values.
> >
> > * Assigning np.NA to an array view with "transientna = True" will
> >  *not* be seen by any of the other views (nor the "original" array),
> >  but anything else will still work "as usual".
> >
> > After all, this is what *you* asked for when using the "transientna =
> > True" argument.
> >
> >
> >
> > To conclude, say that others *must not* care about whether the arrays
> > they're working with have transient NA values. This way, I can create a
> > view with transient NAs, set to NA some uninteresting data, and pass it
> > to a routine written by someone else that sets to NA elements that, for
> > example, are beyond certain threshold from the mean of the elements.
> >
> > This would be equivalent to storing a copy of the original array before
> > passing it to this 3rd party function, only that "transientna", just as
> > views, provide some handy shortcuts to avoid copies.
> >
> >
> > My main point here is that views and local/temporal/transient NAs are
> > all *explicitly* requested, so that its behaviour should not appear as
> > something unexpected.
> >
> > Is there an agreement on this?
>
> Absolutely, if by 'transientna' you mean 'masked'.  The discussion is
> whether the NA API should be the same as the masking API.   The thing
> you are describing is what masking is for, and what it's always been
> for, as far as I can see.   We're arguing that to call this
> 'transientna' instead of 'masked' confuses two concepts that are
> different, to no good purpose.
>
>
It's a hammer. If you want to hammer nails, fine, if you want hammer a bit
of tubing flat, fine. It's a tool, the hammer concept if you will.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20110630/982500c6/attachment.html 


More information about the NumPy-Discussion mailing list