[Numpy-discussion] NA masks in the next numpy release?
Thu Oct 27 21:51:20 CDT 2011
As I mentioned. I find the ability to separate an ABSENT idea from an IGNORED idea convincing. In other words, I think distinguishing between masks and bit-patterns is not just an implementation detail, but provides a useful concept for multiple use-cases.
I understand exactly what it would take to add bit-patterns to NumPy. I also understand what Mark did and agree that it is possible to add Matthew's idea to the current code-base. I think it is worth exploring
On Oct 27, 2011, at 9:08 PM, Charles R Harris wrote:
> On Thu, Oct 27, 2011 at 7:16 PM, Travis Oliphant <firstname.lastname@example.org> wrote:
> That is a pretty good explanation. I find myself convinced by Matthew's arguments. I think that being able to separate ABSENT from IGNORED is a good idea. I also like being able to control SKIP and PROPAGATE (but I think the current implementation allows this already).
> What is the counter-argument to this proposal?
> What exactly do you find convincing? The current masks propagate by default:
> In : a = ones(5, maskna=1)
> In : a = NA
> In : a
> Out: array([ 1., 1., NA, 1., 1.])
> In : a + 1
> Out: array([ 2., 2., NA, 2., 2.])
> In : a = 10
> In : a
> Out: array([ 1., 1., 10., 1., 1.], maskna=True)
> I don't see an essential difference between the implementation using masks and one using bit patterns, the mask when attached to the original array just adds a bit pattern by extending all the types by one byte, an approach that easily extends to all existing and future types, which is why Mark went that way for the first implementation given the time available. The masks are hidden because folks wanted something that behaved more like R and also because of the desire to combine the missing, ignore, and later possibly bit patterns in a unified manner. Note that the pseudo assignment was also meant to look like R. Adding true bit patterns to numpy isn't trivial and I believe Mark was thinking of parametrized types for that.
> The main problems I see with masks are unified storage and possibly memory use. The rest is just behavor and desired API and that can be adjusted within the current implementation. There is nothing essentially masky about masks.
> NumPy-Discussion mailing list
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion