[Numpy-discussion] feedback request: proposal to add masks to the core ndarray

Mark Wiebe mwwiebe@gmail....
Fri Jun 24 20:00:47 CDT 2011


On Fri, Jun 24, 2011 at 6:22 PM, Wes McKinney <wesmckinn@gmail.com> wrote:

> On Fri, Jun 24, 2011 at 7:10 PM, Charles R Harris
> <charlesr.harris@gmail.com> wrote:
> >
> >
> > On Fri, Jun 24, 2011 at 4:21 PM, Matthew Brett <matthew.brett@gmail.com>
> > wrote:
> >>
> >> Hi,
> >>
> >> On Fri, Jun 24, 2011 at 10:09 PM, Benjamin Root <ben.root@ou.edu>
> wrote:
> >> ...
> >> > Again, there are pros and cons either way and I see them very
> orthogonal
> >> > and
> >> > complementary.
> >>
> >> That may be true, but I imagine only one of them will be implemented.
> >>
> >> @Mark - I don't have a clear idea whether you consider the nafloat64
> >> option to be still in play as the first thing to be implemented
> >> (before array.mask).   If it is, what kind of thing would persuade you
> >> either way?
> >>
> >
> > Mark can speak for himself,  but I think things are tending towards
> masks.
> > They have the advantage of one implementation for all data types, current
> > and future, and they are more flexible since the masked data can be
> actual
> > valid data that you just choose to ignore for experimental  reasons.
> >
> > What might be helpful is a routine to import/export R files, but that
> > shouldn't be to difficult to implement.
> >
> > Chuck
> >
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> >
>
> Perhaps we should make a wiki page someplace summarizing pros and cons
> of the various implementation approaches? I worry very seriously about
> adding API functions relating to masks rather than having special NA
> values which propagate in algorithms. The question is: will Joe Blow
> Former R user have to understand what is the mask and how to work with
> it? If the answer is yes we have a problem. If it can be completely
> hidden as an implementation detail, that's great. In R NAs are just
> sort of inherent-- they propagate you deal with them when you have to
> via na.rm flag in functions or is.na.
>

I think the interface for how it looks in NumPy can be made to be pretty
close to the same with either design approach. I've updated the NEP to add
and emphasize using masked values with an np.NA singleton, with the
validitymask as the implementation mechanism which is still accessible for
those who want to still deal with the mask directly.


> The other problem I can think of with masks is the extra memory
> footprint, though maybe this is no cause for concern.
>

The overhead is definitely worth considering, along with the extra memory
traffic it generates, and I've basically concluded that the increased
generality and flexibility is worth the added cost.

-Mark


>
> -W
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20110624/9474ca05/attachment.html 


More information about the NumPy-Discussion mailing list