[Numpy-discussion] missing data: semantics

Charles R Harris charlesr.harris@gmail....
Thu Jun 30 13:11:52 CDT 2011


On Thu, Jun 30, 2011 at 11:46 AM, Lluís <xscript@gmx.net> wrote:

> Ok, I think it's time to step back and reformulate the problem by
> completely ignoring the implementation.
>
> Here we have 2 "generic" concepts (i.e., applicable to R), plus another
> extra concept that is exclusive to numpy:
>
> * Assigning np.NA to an array, cannot be undone unless through explicit
>  assignment (i.e., assigning a new arbitrary value, or saving a copy of
>  the original array before assigning np.NA).
>
> * np.NA values propagate by default, unless ufuncs have the "skipna =
>  True" argument (or the other way around, it doesn't really matter to
>  this discussion). In order to avoid passing the argument on each
>  ufunc, we either have some per-array variable for the default "skipna"
>  value (undesirable) or we can make a trivial ndarray subclass that
>  will set the "skipna" argument on all ufuncs through the
>  "_ufunc_wrapper_" mechanism.
>
>
>
> Now, numpy has the concept of views, which adds some more goodies to the
> list of concepts:
>
> * With views, two arrays can share the same physical data, so that
>  assignments to any of them will be seen by others (including NA
>  values).
>
> The creation of a view is explicitly stated by the user, so its
> behaviour should not be perceived as odd (after all, you asked for a
> view).
>
> The good thing is that with views you can avoid costly array copies if
> you're careful when writing into these views.
>
>
>
> Now, you can add a new concept: local/temporal/transient missing data.
>
> We can take an existing array and create a view with the new argument
> "transientna = True".
>
>
This is already there: x.view(masked=1), although the keyword transientna
has appeal, not least because it avoids the word 'mask', which seems a
source of endless confusion. Note that currently this is only supposed to
work if the original array is unmasked.

Here, both the view and the "transientna = True" are explicitly stated
> by the user, so it is assumed that she already knows what this is all
> about.
>
> The difference with a regular view is that you also explicitly asked for
> local/temporal/transient NA values.
>
> * Assigning np.NA to an array view with "transientna = True" will
>  *not* be seen by any of the other views (nor the "original" array),
>  but anything else will still work "as usual".
>
> After all, this is what *you* asked for when using the "transientna =
> True" argument.
>
>
>
> To conclude, say that others *must not* care about whether the arrays
> they're working with have transient NA values. This way, I can create a
> view with transient NAs, set to NA some uninteresting data, and pass it
> to a routine written by someone else that sets to NA elements that, for
> example, are beyond certain threshold from the mean of the elements.
>
> This would be equivalent to storing a copy of the original array before
> passing it to this 3rd party function, only that "transientna", just as
> views, provide some handy shortcuts to avoid copies.
>
>
> My main point here is that views and local/temporal/transient NAs are
> all *explicitly* requested, so that its behaviour should not appear as
> something unexpected.
>
> Is there an agreement on this?
>
>
Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20110630/6d08f33d/attachment.html 


More information about the NumPy-Discussion mailing list