[Numpy-discussion] feedback request: proposal to add masks to the core ndarray
Thu Jun 23 17:14:16 CDT 2011
On Thu, Jun 23, 2011 at 4:54 PM, Eric Firing <firstname.lastname@example.org> wrote:
> On 06/23/2011 11:19 AM, Nathaniel Smith wrote:
> > I'd like to see a statement of what the "missing data problem" is, and
> > how this solves it? Because I don't think this is entirely intuitive,
> > or that everyone necessarily has the same idea.
> >> Reduction operations like 'sum', 'prod', 'min', and 'max' will operate
> as if the values weren't there
> > For context: My experience with missing data is in statistical
> > analysis; I find R's NA support to be pretty awesome for those
> > purposes. The conceptual model it's based on is that an NA value is
> > some number that we just happen not to know. So from this perspective,
> > I find it pretty confusing that adding an unknown quantity to 3 should
> > result in 3, rather than another unknown quantity. (Obviously it
> > should be possible to compute the sum of the known values, but IME
> > it's important for the default behavior to be to fail loudly when
> > things are wonky, not to silently patch them up, possibly
> > incorrectly!)
> From the oceanographic data acquisition and analysis perspective, and
> perhaps from a more general plotting perspective (matplotlib,
> specifically) missing data is simply missing; we don't have it, we never
> will, but we need to do the best calculation (or plot) we can with what
> is left. For plotting, that generally means showing a gap in a line, a
> hole in a contour plot, etc. For calculations like basic statistics, it
> means doing the calculation, e.g. a mean, with the available numbers,
> *and* having an easy way to find out how many numbers were available.
> That's what the masked array count() method is for.
I'm thinking a parameter for sum, mean, etc which enables this
interpretation is a good approach for these calculations.
Some types of calculations, like the FFT, simply can't be done by
> ignoring missing values, so one must first use some filling method,
> perhaps interpolation, for example, and then pass an unmasked array to
> the function.
These kinds of functions will have to raise exceptions when called on an
array which has an masked value, true.
> The present masked array module is very close to what is really needed
> for the sorts of things I am involved with. It looks to me like the
> main deficiencies are addressed by Mark's proposal, although the change
> in the definition of the mask might make for a painful transition.
Yeah, I understand the pain, but I'd much prefer to align with the general
consensus about masks elsewhere than stick with the current convention.
> > Also, what should 'dot' do with missing values?
> > -- Nathaniel
> > On Thu, Jun 23, 2011 at 1:53 PM, Mark Wiebe<email@example.com> wrote:
> >> Enthought has asked me to look into the "missing data" problem and how
> >> could treat it better. I've considered the different ideas of adding
> >> variants with a special signal value and masked arrays, and concluded
> >> adding masks to the core ndarray appears is the best way to deal with
> >> problem in general.
> >> I've written a NEP that proposes a particular design, viewable here:
> >> There are some questions at the bottom of the NEP which definitely need
> >> discussion to find the best design choices. Please read, and let me know
> >> all the errors and gaps you find in the document.
> >> Thanks,
> >> Mark
> >> _______________________________________________
> >> NumPy-Discussion mailing list
> >> NumPy-Discussion@scipy.org
> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> NumPy-Discussion mailing list
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion