[Numpy-discussion] Missing data again

Charles R Harris charlesr.harris@gmail....
Wed Mar 7 13:37:15 CST 2012


On Wed, Mar 7, 2012 at 12:26 PM, Nathaniel Smith <njs@pobox.com> wrote:

> On Wed, Mar 7, 2012 at 5:17 PM, Charles R Harris
> <charlesr.harris@gmail.com> wrote:
> > On Wed, Mar 7, 2012 at 9:35 AM, Pierre Haessig <pierre.haessig@crans.org
> >
> >> Coming back to Travis proposition "bit-pattern approaches to missing
> >> data (*at least* for float64 and int32) need to be implemented.", I
> >> wonder what is the amount of extra work to go from nafloat64 to
> >> nafloat32/16 ? Is there an hardware support NaN payloads with these
> >> smaller floats ? If not, or if it is too complicated, I feel it is
> >> acceptable to say "it's too complicated" and fall back to mask. One may
> >> have to choose between fancy types and fancy NAs...
> >
> > I'm in agreement here, and that was a major consideration in making a
> > 'masked' implementation first.
>
> When it comes to "missing data", bitpatterns can do everything that
> masks can do, are no more complicated to implement, and have better
> performance characteristics.
>
>
Maybe for float, for other things, no. And we have lots of otherthings. The
performance is a strawman, and it *isn't* easier to implement.


> > Also, different folks adopt different values
> > for 'missing' data, and distributing one or several masks along with the
> > data is another common practice.
>
> True, but not really relevant to the current debate, because you have
> to handle such issues as part of your general data import workflow
> anyway, and none of these is any more complicated no matter which
> implementations are available.
>
> > One inconvenience I have run into with the current API is that is should
> be
> > easier to clear the mask from an "ignored" value without taking a new
> view
> > or assigning known data. So maybe two types of masks (different
> payloads),
> > or an additional flag could be helpful. The process of assigning masks
> could
> > also be made a bit easier than using fancy indexing.
>
> So this, uh... this was actually the whole goal of the "alterNEP"
> design for masks -- making all this stuff easy for people (like you,
> apparently?) that want support for ignored values, separately from
> missing data, and want a nice clean API for it. Basically having a
> separate .mask attribute which was an ordinary, assignable array
> broadcastable to the attached array's shape. Nobody seemed interested
> in talking about it much then but maybe there's interest now?
>
>
Come off it, Nathaniel, the problem is minor and fixable. The intent of the
initial implementation was to discover such things. These things are less
accessible with the current API *precisely* because of the feedback from R
users. It didn't start that way.

We now have something to evolve into what we want. That is a heck of a lot
more useful than endless discussion.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20120307/8ae928d7/attachment-0001.html 


More information about the NumPy-Discussion mailing list