[Numpy-discussion] in the NA discussion, what can we agree on?

Gary Strangman strang@nmr.mgh.harvard....
Fri Nov 4 10:19:54 CDT 2011


On Fri, 4 Nov 2011, Benjamin Root wrote:
> 
> On Friday, November 4, 2011, Gary Strangman <strang@nmr.mgh.harvard.edu>
> wrote:
> >
> >> > non-destructive+propagating -- it really depends on exactly what
> >> > computations you want to perform, and how you expect them to work. The
> >> > main difference is how reduction operations are treated. I kind of
> >> > feel like the non-propagating version makes more sense overall, but I
> >> > don't know if there's any consensus on that.
> >>
> >> I think this is further evidence for my idea that a mask should not be
> >> undone, but is non destructive.  If you want to be able to access the
> values
> >> after masking, have a view, or only apply the mask to a view.
> >
> > OK, so my understanding of what's meant by propagating is probably
> incomplete (and is definitely still fuzzy). I'm a little confused by the
> phrase "a mask should not be undone" though. Say I want to perform a
> statistical analysis or filtering procedure excluding and (separately)
> including a handful of outliers? Isn't that a natural case for undoing a
> mask? Or did you mean something else?
> >
> > I think I understand the "use a view" option above, though I don't see how
> one could apply a mask only to a view. What if my view is every other row in
> a 2D array, and I want to mask the last half of this view? What is the state
> of the original array once the mask has been applied?
> >
> > (If this is derailing the progress of this thread, feel free to ignore
> it.)
> >
> > -best
> > Gary
> 
> Ufuncs can be broadly categorized as element-wise (binary ops like +, *,
> etc) as well as regular functions that return an array with a shape that
> matches the inputs broadcasted together.  And reduction ops (sum, min, mean,
> etc).
> 
> For element-wise, things are a bit murky for IGNORE, and I defer to Mark's
> NEP:
> https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst#id17,
> and it probably should be expanded and clarified in the NEP.
> 
> For reduction ops, propagation means that sum([3 5 NA 6]) == NA, just like
> if you had a NaN in the array. Non-propagating (or skipping or ignore) would
> have that operation produce 14.  A mean() for the propagating case would be
> NA, but 4.6666 for non-propagating.
> 
> The part about undoing a mask is addressing the issue of when an operation
> produces a new array that has ignored elements in it, then those elements
> never were initialized with any value at all.  Therefore, "unmasking" those
> elements and accessing their values make no sense. This and more are covered
> in this section of the NEP:
> https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst#id11
> 
> For your stated case, I would have two views of the data (or at least the
> original data and a view of it).  For the view, I would apply the mask to
> hide the outliers from the filtering operation and produce a result.  The
> first view (or the original array) sees the same data as it did before the
> other view took on a mask, so you can perform the filtering operation on the
> data and have two separate results. You can keep the masked view for
> subsequent calculations, and/or keep the original view, and/or create new
> views with new masks for other analyzes, all while keeping the original data
> intact.
> 
> Note that I am right now speaking of views in a somewhat more abstract sense
> that is only loosely tied to numpy's specific behavior with respect to views
> right now.  As for np.view() in specific, that is an implementation detail
> that probably shouldn't be in this thread yet, so don't hook too much onto
> it.

Thanks Ben. That's quite helpful. And it also points to my worry (sorry, I 
already knew enough about views to be dangerous) ... your "conceptual" 
version of views is great, but I don't think numpy fully and reliably 
follows it (occasionally giving copies instead of views, for example, when 
a view is particularly difficult to generate). So I worry that your notion 
of views will actually collide with core numpy view implementations. But 
like you said, perhaps this thread shouldn't go there (yet).

Given I'm still fuzzy on all the distinctions, perhaps someone could try 
to help me (and others?) to define all /4/ logical possibilities ... some 
may be obvious dead-ends. I'll take a stab at them, but these should 
definitely get edited by others:

destructive + propagating = the data point is truly missing (satellite 
fell into the ocean; dog ate my source datasheet, or whatever), this is 
the nature of that data point, such missingness should be replicated in 
elementwise operations, and the missingness SHOULD interfere with 
reduction operations that involve that datapoint 
(np.sum([1,MISSING])=MISSING)

destructive + non-propagating = the data point is truly missing, this is 
the nature of that data point, such missingness should be replicated in 
elementwise operations, but such missingness should NOT interfere with 
reduction operations that involve that datapoint (np.sum([1,MISSING])=1)

non-destructive + propagating = I want to ignore this datapoint for 
now; element-wise operations should replicate this "ignore" designation, 
and missingness of this type SHOULD interfere with reduction operations 
that involve this datapoint (np.sum([1,IGNORE])=IGNORE)

non-destructive + non-propagating = I want to ignore this datapoint for 
now; element-wise operations should replicate this "ignore" designation, 
but missingness of this type SHOULD NOT interfere with reduction 
operations that involve this datapoint (np.sum([1,IGNORE])=1)

Comments?

-best
Gary


The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.


More information about the NumPy-Discussion mailing list