[Numpy-discussion] NA masks in the next numpy release?
Fri Oct 28 12:39:07 CDT 2011
On Thu, Oct 27, 2011 at 10:56 PM, Benjamin Root <email@example.com> wrote:
> On Thursday, October 27, 2011, Charles R Harris <firstname.lastname@example.org>
>> On Thu, Oct 27, 2011 at 7:16 PM, Travis Oliphant <email@example.com>
>>> That is a pretty good explanation. I find myself convinced by Matthew's
>>> arguments. I think that being able to separate ABSENT from IGNORED is a
>>> good idea. I also like being able to control SKIP and PROPAGATE (but I
>>> think the current implementation allows this already).
>>> What is the counter-argument to this proposal?
>> What exactly do you find convincing? The current masks propagate by
>> In : a = ones(5, maskna=1)
>> In : a = NA
>> In : a
>> Out: array([ 1., 1., NA, 1., 1.])
>> In : a + 1
>> Out: array([ 2., 2., NA, 2., 2.])
>> In : a = 10
>> In : a
>> Out: array([ 1., 1., 10., 1., 1.], maskna=True)
>> I don't see an essential difference between the implementation using masks
>> and one using bit patterns, the mask when attached to the original array
>> just adds a bit pattern by extending all the types by one byte, an approach
>> that easily extends to all existing and future types, which is why Mark went
>> that way for the first implementation given the time available. The masks
>> are hidden because folks wanted something that behaved more like R and also
>> because of the desire to combine the missing, ignore, and later possibly bit
>> patterns in a unified manner. Note that the pseudo assignment was also meant
>> to look like R. Adding true bit patterns to numpy isn't trivial and I
>> believe Mark was thinking of parametrized types for that.
>> The main problems I see with masks are unified storage and possibly memory
>> use. The rest is just behavor and desired API and that can be adjusted
>> within the current implementation. There is nothing essentially masky about
> I think chuck sums it up quite nicely. The implementation detail about
> using mask versus bit patterns can still be discussed and addressed.
> Personally, I just don't see how parameterized dtypes would be easier to use
> than the pseudo assignment.
> The elegance of mark's solution was to consider the treatment of missing
> data in a unified manner. This puts missing data in a more prominent spot
> for extension builders, which should greatly improve support throughout the
Are extension builders then required to use the numpy C API to get
their data? Speaking as an extension builder, I would rather you gave
me the mask and the bitpattern information and let me do that myself.
> By letting there be a single missing data framework (instead of
> two) all that users need to figure out is when they want nan-like behavior
> (propagate) or to be more like masks (skip). Numpy takes care of the rest.
> There is a reason why I like using masked arrays because I don't have to
> use nansum in my library functions to guard against the possibility of
> receiving nans. Duck-typing is a good thing.
> My argument against separating IGNORE and PROPAGATE is that it becomes too
> tempting to want to mix these in an array, but the desired behavior would
> likely become ambiguous..
> There is one other proplem that I just thought of that I don't think has
> been outlined in either NEP. What if I perform an operation between an
> array set up with propagate NAs and an array with skip NAs?
These are explicitly covered in the alterNEP:
More information about the NumPy-Discussion