[Numpy-discussion] feedback request: proposal to add masks to the core ndarray
Thu Jun 23 16:37:20 CDT 2011
On Thu, Jun 23, 2011 at 4:19 PM, Nathaniel Smith <firstname.lastname@example.org> wrote:
> I'd like to see a statement of what the "missing data problem" is, and
> how this solves it? Because I don't think this is entirely intuitive,
> or that everyone necessarily has the same idea.
I agree it represents different problems in different contexts. For NumPy, I
think the mechanism for dealing with it needs to be intuitive to work with
in a maximum number of contexts, avoiding surprises. Getting feedback from a
broad range of people is the only way a general solution can be designed
with any level of confidence.
> Reduction operations like 'sum', 'prod', 'min', and 'max' will operate as
> if the values weren't there
> For context: My experience with missing data is in statistical
> analysis; I find R's NA support to be pretty awesome for those
> purposes. The conceptual model it's based on is that an NA value is
> some number that we just happen not to know. So from this perspective,
> I find it pretty confusing that adding an unknown quantity to 3 should
> result in 3, rather than another unknown quantity. (Obviously it
> should be possible to compute the sum of the known values, but IME
> it's important for the default behavior to be to fail loudly when
> things are wonky, not to silently patch them up, possibly
The conceptual model you describe sounds reasonable to me, and I definitely
like the idea of consistently following one such model for all default
> Also, what should 'dot' do with missing values?
A matrix multiplication is defined in terms of sums of products, so it can
be implemented to behave consistently with your conceptual model.
> -- Nathaniel
> On Thu, Jun 23, 2011 at 1:53 PM, Mark Wiebe <email@example.com> wrote:
> > Enthought has asked me to look into the "missing data" problem and how
> > could treat it better. I've considered the different ideas of adding
> > variants with a special signal value and masked arrays, and concluded
> > adding masks to the core ndarray appears is the best way to deal with the
> > problem in general.
> > I've written a NEP that proposes a particular design, viewable here:
> > There are some questions at the bottom of the NEP which definitely need
> > discussion to find the best design choices. Please read, and let me know
> > all the errors and gaps you find in the document.
> > Thanks,
> > Mark
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> NumPy-Discussion mailing list
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion