[Numpy-discussion] Missing data again
Wed Mar 7 13:57:35 CST 2012
On 03/07/2012 09:26 AM, Nathaniel Smith wrote:
> On Wed, Mar 7, 2012 at 5:17 PM, Charles R Harris
> <email@example.com> wrote:
>> On Wed, Mar 7, 2012 at 9:35 AM, Pierre Haessig<firstname.lastname@example.org>
>>> Coming back to Travis proposition "bit-pattern approaches to missing
>>> data (*at least* for float64 and int32) need to be implemented.", I
>>> wonder what is the amount of extra work to go from nafloat64 to
>>> nafloat32/16 ? Is there an hardware support NaN payloads with these
>>> smaller floats ? If not, or if it is too complicated, I feel it is
>>> acceptable to say "it's too complicated" and fall back to mask. One may
>>> have to choose between fancy types and fancy NAs...
>> I'm in agreement here, and that was a major consideration in making a
>> 'masked' implementation first.
> When it comes to "missing data", bitpatterns can do everything that
> masks can do, are no more complicated to implement, and have better
> performance characteristics.
>> Also, different folks adopt different values
>> for 'missing' data, and distributing one or several masks along with the
>> data is another common practice.
> True, but not really relevant to the current debate, because you have
> to handle such issues as part of your general data import workflow
> anyway, and none of these is any more complicated no matter which
> implementations are available.
>> One inconvenience I have run into with the current API is that is should be
>> easier to clear the mask from an "ignored" value without taking a new view
>> or assigning known data. So maybe two types of masks (different payloads),
>> or an additional flag could be helpful. The process of assigning masks could
>> also be made a bit easier than using fancy indexing.
> So this, uh... this was actually the whole goal of the "alterNEP"
> design for masks -- making all this stuff easy for people (like you,
> apparently?) that want support for ignored values, separately from
> missing data, and want a nice clean API for it. Basically having a
> separate .mask attribute which was an ordinary, assignable array
> broadcastable to the attached array's shape. Nobody seemed interested
> in talking about it much then but maybe there's interest now?
In other words, good low-level support for numpy.ma functionality? With
a migration path so that a separate numpy.ma might wither away? Yes,
there is interest; this is exactly what I think is needed for my own
style of applications (which I think are common at least in geoscience),
and for matplotlib. The question is how to achieve it as simply and
cleanly as possible while also satisfying the needs of the R users, and
while making it easy for matplotlib, for example, to handle *any*
reasonable input: ma, other masking, nan, or NA-bitpattern.
It may be that a rather pragmatic approach to implementation will prove
better than a highly idealized set of data models. Or, it may be that a
dual approach is best, in which the flag value missing data
implementation is tightly bound to the R model and the mask
implementation is explicitly designed for the numpy.ma model. In any
case, a reasonable level of agreement on the goals is needed. I presume
Travis's involvement will facilitate a clarification of the goals and of
the implementation; and I expect that much of Mark's work will end up
serving well, even if much needs to be added and the API evolves
> -- Nathaniel
> NumPy-Discussion mailing list
More information about the NumPy-Discussion