[Numpy-discussion] feedback request: proposal to add masks to the core ndarray

Wes McKinney wesmckinn@gmail....
Fri Jun 24 18:22:11 CDT 2011

On Fri, Jun 24, 2011 at 7:10 PM, Charles R Harris
<charlesr.harris@gmail.com> wrote:
> On Fri, Jun 24, 2011 at 4:21 PM, Matthew Brett <matthew.brett@gmail.com>
> wrote:
>> Hi,
>> On Fri, Jun 24, 2011 at 10:09 PM, Benjamin Root <ben.root@ou.edu> wrote:
>> ...
>> > Again, there are pros and cons either way and I see them very orthogonal
>> > and
>> > complementary.
>> That may be true, but I imagine only one of them will be implemented.
>> @Mark - I don't have a clear idea whether you consider the nafloat64
>> option to be still in play as the first thing to be implemented
>> (before array.mask).   If it is, what kind of thing would persuade you
>> either way?
> Mark can speak for himself,  but I think things are tending towards masks.
> They have the advantage of one implementation for all data types, current
> and future, and they are more flexible since the masked data can be actual
> valid data that you just choose to ignore for experimental  reasons.
> What might be helpful is a routine to import/export R files, but that
> shouldn't be to difficult to implement.
> Chuck
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

Perhaps we should make a wiki page someplace summarizing pros and cons
of the various implementation approaches? I worry very seriously about
adding API functions relating to masks rather than having special NA
values which propagate in algorithms. The question is: will Joe Blow
Former R user have to understand what is the mask and how to work with
it? If the answer is yes we have a problem. If it can be completely
hidden as an implementation detail, that's great. In R NAs are just
sort of inherent-- they propagate you deal with them when you have to
via na.rm flag in functions or is.na.

The other problem I can think of with masks is the extra memory
footprint, though maybe this is no cause for concern.


More information about the NumPy-Discussion mailing list