[Numpy-discussion] Missing data again
Charles R Harris
Wed Mar 7 13:37:15 CST 2012
On Wed, Mar 7, 2012 at 12:26 PM, Nathaniel Smith <firstname.lastname@example.org> wrote:
> On Wed, Mar 7, 2012 at 5:17 PM, Charles R Harris
> <email@example.com> wrote:
> > On Wed, Mar 7, 2012 at 9:35 AM, Pierre Haessig <firstname.lastname@example.org
> >> Coming back to Travis proposition "bit-pattern approaches to missing
> >> data (*at least* for float64 and int32) need to be implemented.", I
> >> wonder what is the amount of extra work to go from nafloat64 to
> >> nafloat32/16 ? Is there an hardware support NaN payloads with these
> >> smaller floats ? If not, or if it is too complicated, I feel it is
> >> acceptable to say "it's too complicated" and fall back to mask. One may
> >> have to choose between fancy types and fancy NAs...
> > I'm in agreement here, and that was a major consideration in making a
> > 'masked' implementation first.
> When it comes to "missing data", bitpatterns can do everything that
> masks can do, are no more complicated to implement, and have better
> performance characteristics.
Maybe for float, for other things, no. And we have lots of otherthings. The
performance is a strawman, and it *isn't* easier to implement.
> > Also, different folks adopt different values
> > for 'missing' data, and distributing one or several masks along with the
> > data is another common practice.
> True, but not really relevant to the current debate, because you have
> to handle such issues as part of your general data import workflow
> anyway, and none of these is any more complicated no matter which
> implementations are available.
> > One inconvenience I have run into with the current API is that is should
> > easier to clear the mask from an "ignored" value without taking a new
> > or assigning known data. So maybe two types of masks (different
> > or an additional flag could be helpful. The process of assigning masks
> > also be made a bit easier than using fancy indexing.
> So this, uh... this was actually the whole goal of the "alterNEP"
> design for masks -- making all this stuff easy for people (like you,
> apparently?) that want support for ignored values, separately from
> missing data, and want a nice clean API for it. Basically having a
> separate .mask attribute which was an ordinary, assignable array
> broadcastable to the attached array's shape. Nobody seemed interested
> in talking about it much then but maybe there's interest now?
Come off it, Nathaniel, the problem is minor and fixable. The intent of the
initial implementation was to discover such things. These things are less
accessible with the current API *precisely* because of the feedback from R
users. It didn't start that way.
We now have something to evolve into what we want. That is a heck of a lot
more useful than endless discussion.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion