[Numpy-discussion] feedback request: proposal to add masks to the core ndarray
Thu Jun 23 16:19:19 CDT 2011
I'd like to see a statement of what the "missing data problem" is, and
how this solves it? Because I don't think this is entirely intuitive,
or that everyone necessarily has the same idea.
> Reduction operations like 'sum', 'prod', 'min', and 'max' will operate as if the values weren't there
For context: My experience with missing data is in statistical
analysis; I find R's NA support to be pretty awesome for those
purposes. The conceptual model it's based on is that an NA value is
some number that we just happen not to know. So from this perspective,
I find it pretty confusing that adding an unknown quantity to 3 should
result in 3, rather than another unknown quantity. (Obviously it
should be possible to compute the sum of the known values, but IME
it's important for the default behavior to be to fail loudly when
things are wonky, not to silently patch them up, possibly
Also, what should 'dot' do with missing values?
On Thu, Jun 23, 2011 at 1:53 PM, Mark Wiebe <firstname.lastname@example.org> wrote:
> Enthought has asked me to look into the "missing data" problem and how NumPy
> could treat it better. I've considered the different ideas of adding dtype
> variants with a special signal value and masked arrays, and concluded that
> adding masks to the core ndarray appears is the best way to deal with the
> problem in general.
> I've written a NEP that proposes a particular design, viewable here:
> There are some questions at the bottom of the NEP which definitely need
> discussion to find the best design choices. Please read, and let me know of
> all the errors and gaps you find in the document.
> NumPy-Discussion mailing list
More information about the NumPy-Discussion