[Numpy-discussion] NA masks in the next numpy release?
Charles R Harris
Fri Oct 28 18:19:23 CDT 2011
On Fri, Oct 28, 2011 at 5:05 PM, Chris.Barker <Chris.Barker@noaa.gov> wrote:
> On 10/28/11 11:37 AM, Matthew Brett wrote:
> > The main motivation for the alterNEP was our strong feeling that
> > separating ABSENT and IGNORE was easier to comprehend and cleaner.
> I don't know about easier to comprehend, or cleaner, but it is more
> I see two issues here:
> 1) being able to distinguish between "ignore" and "not valid"
> -- and being able to stop ignoring an ignored value.
> This could quite easily be accomplished with a mask approach -- indeed
> with 8 bits, you could have 8 different possible masked states (not that
> I'm suggesting that, at least not in core numpy.)
> However, with a bit-pattern approach, you simply can't implement
> "ignore". Once it's been set, the previous value is lost.
> 2) data size: A full mask takes extra space, sometimes a substantial
> amount -- so a bit-pattern approach would be nice.
> I like the idea (that I think Mark attempted to implement) that the
> implementation should be hidden from the user - not necessarily entirely
> hidden, but subtle enough that that casual user wouldn't need to care
> about it.
I believe the main reason it is hidden from the user is so that the
implementation can be changed without impacting existing applications.
What I would like to see at this point is folks trying out the software and
asking questions on the list like: "I want to do A and tried B, which didn't
work. Any suggestions?" In short, I want people to actually use the software
to see what issues arise so that we can fix things up.
Memory use is a known problem. One way to start addressing it might be to
implement a "bit" arraytype. It might even be possible to prototype that on
top of the existing types. Views make bit arrays a bit more interesting ;)
In that case, I think if we could decide that we want both "ignore" and
> "not valid" (and it seems there is a fair bit of interest in that), then
> we can proceed with a mask-based approach, and develop an API that makes
> as little reference to the mask as possible.
Then a bit-pattern approach could be developed that uses the same API --
> it would not have the "ignore" option at all, but would be the same for
> the "not valid" option.
> When I write this it seem entirely too complicated for both the
> developers and users, but maybe it's not -- it could be analogous to
> what we have now: arrays can be Fortran or C ordered, contiguous or not,
> be views on other arrays or not. To really make numpy dance, you need to
> understand all that, but you can also do a whole lot, and write a lot of
> generic code, without digging into that.
> If we do all that, maybe there could be a sparse mask implementation,
> etc. as well.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion