[Numpy-discussion] feedback request: proposal to add masks to the core ndarray

Mark Wiebe mwwiebe@gmail....
Thu Jun 23 16:55:00 CDT 2011


On Thu, Jun 23, 2011 at 4:46 PM, Charles R Harris <charlesr.harris@gmail.com
> wrote:

> On Thu, Jun 23, 2011 at 2:53 PM, Mark Wiebe <mwwiebe@gmail.com> wrote:
>
>> Enthought has asked me to look into the "missing data" problem and how
>> NumPy could treat it better. I've considered the different ideas of adding
>> dtype variants with a special signal value and masked arrays, and concluded
>> that adding masks to the core ndarray appears is the best way to deal with
>> the problem in general.
>>
>> I've written a NEP that proposes a particular design, viewable here:
>>
>>
>> https://github.com/m-paradox/numpy/blob/cmaskedarray/doc/neps/c-masked-array.rst
>>
>> There are some questions at the bottom of the NEP which definitely need
>> discussion to find the best design choices. Please read, and let me know of
>> all the errors and gaps you find in the document.
>>
>>
> I agree that low level support for masks is the way to go.
>
> > If all the input values are masked, 'sum' and 'prod' will produce the
> additive and multiplicative identities respectively
>
> A masked zero dimensional array might be another option, depending on how
> you handle scalars. This would also work when arrays were summed down an
> axis if a masked array was returned.
>

I think there has to be a difference like with "sum" and "nansum". Maybe
control over this would be a parameter to the sum function, indicating how
to interpret masked values.


> I suppose the problem with using the word 'mask' is the implication that it
> hides something. Maybe 'window' would be an alternate choice, although in
> this context I tend to think of 'mask' as having the meaning you assign to
> it.
>

Some copy/paste from the NEP:

There is some consternation about the conventional True/False
interpretation of the mask, centered around the name "mask". One
possibility to deal with this is to call it a "validity mask" in
all documentation, which more clearly indicates that True means
valid data. If this isn't sufficient, an alternate name for the
attribute could be found, like "a.validitymask", "a.validmask",
or "a.validity".


-Mark


> Chuck
>
>
>> Thanks,
>> Mark
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20110623/e3df9cab/attachment.html 


More information about the NumPy-Discussion mailing list