[Numpy-discussion] feedback request: proposal to add masks to the core ndarray

Mark Wiebe mwwiebe@gmail....
Thu Jun 23 16:37:20 CDT 2011


On Thu, Jun 23, 2011 at 4:19 PM, Nathaniel Smith <njs@pobox.com> wrote:

> I'd like to see a statement of what the "missing data problem" is, and
> how this solves it? Because I don't think this is entirely intuitive,
> or that everyone necessarily has the same idea.
>

I agree it represents different problems in different contexts. For NumPy, I
think the mechanism for dealing with it needs to be intuitive to work with
in a maximum number of contexts, avoiding surprises. Getting feedback from a
broad range of people is the only way a general solution can be designed
with any level of confidence.

> Reduction operations like 'sum', 'prod', 'min', and 'max' will operate as
> if the values weren't there
>
> For context: My experience with missing data is in statistical
> analysis; I find R's NA support to be pretty awesome for those
> purposes. The conceptual model it's based on is that an NA value is
> some number that we just happen not to know. So from this perspective,
> I find it pretty confusing that adding an unknown quantity to 3 should
> result in 3, rather than another unknown quantity. (Obviously it
> should be possible to compute the sum of the known values, but IME
> it's important for the default behavior to be to fail loudly when
> things are wonky, not to silently patch them up, possibly
> incorrectly!)
>

The conceptual model you describe sounds reasonable to me, and I definitely
like the idea of consistently following one such model for all default
behaviors.


> Also, what should 'dot' do with missing values?
>

A matrix multiplication is defined in terms of sums of products, so it can
be implemented to behave consistently with your conceptual model.


>
> -- Nathaniel
>
> On Thu, Jun 23, 2011 at 1:53 PM, Mark Wiebe <mwwiebe@gmail.com> wrote:
> > Enthought has asked me to look into the "missing data" problem and how
> NumPy
> > could treat it better. I've considered the different ideas of adding
> dtype
> > variants with a special signal value and masked arrays, and concluded
> that
> > adding masks to the core ndarray appears is the best way to deal with the
> > problem in general.
> > I've written a NEP that proposes a particular design, viewable here:
> >
> https://github.com/m-paradox/numpy/blob/cmaskedarray/doc/neps/c-masked-array.rst
> > There are some questions at the bottom of the NEP which definitely need
> > discussion to find the best design choices. Please read, and let me know
> of
> > all the errors and gaps you find in the document.
> > Thanks,
> > Mark
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> >
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20110623/837d7d6d/attachment.html 


More information about the NumPy-Discussion mailing list