[Numpy-discussion] Missing data again
Wed Mar 7 11:14:15 CST 2012
On Wed, Mar 7, 2012 at 4:35 PM, Pierre Haessig <email@example.com> wrote:
> Thanks you very much for your lights !
> Le 06/03/2012 21:59, Nathaniel Smith a écrit :
>> Right -- R has a very impoverished type system as compared to numpy.
>> There's basically four types: "numeric" (meaning double precision
>> float), "integer", "logical" (boolean), and "character" (string). And
>> in practice the integer type is essentially unused, because R parses
>> numbers like "1" as being floating point, not integer; the only way to
>> get an integer value is to explicitly cast to it. Each of these types
>> has a specific bit-pattern set aside for representing NA. And...
>> that's it. It's very simple when it works, but also very limited.
> I also suspected R to be less powerful in terms of types.
> However, I think the fact that "It's very simple when it works" is
> important to take into account. At the end of the day, when using all
> the fanciness it is not only about "can I have some NAs in my array ?"
> but also "how *easily* can I have some NAs in my array ?". It's about
> balancing the "how easy" and the "how powerful".
> The easyness-of-use is the reason of my concern about having separate
> types "nafloatNN" and "floatNN". Of course, I won't argue that "not
> breaking everything" is even more important !!
It's a good point, I just don't see how we can really tell what the
trade-offs are at this point. You should bring this up again once more
of the big picture stuff is hammered out.
> Coming back to Travis proposition "bit-pattern approaches to missing
> data (*at least* for float64 and int32) need to be implemented.", I
> wonder what is the amount of extra work to go from nafloat64 to
> nafloat32/16 ? Is there an hardware support NaN payloads with these
> smaller floats ? If not, or if it is too complicated, I feel it is
> acceptable to say "it's too complicated" and fall back to mask. One may
> have to choose between fancy types and fancy NAs...
All modern floating point formats can represent NaNs with payloads, so
in principle there's no difficulty in supporting NA the same way for
all of them. If you're using float16 because you want to offload
computation to a GPU then I would test carefully before trusting the
GPU to handle NaNs correctly, and there may need to be a bit of care
to make sure that casts between these types properly map NAs to NAs,
but generally it should be fine.
More information about the NumPy-Discussion