[Numpy-discussion] Missing data again
Wed Mar 7 10:35:44 CST 2012
Thanks you very much for your lights !
Le 06/03/2012 21:59, Nathaniel Smith a écrit :
> Right -- R has a very impoverished type system as compared to numpy.
> There's basically four types: "numeric" (meaning double precision
> float), "integer", "logical" (boolean), and "character" (string). And
> in practice the integer type is essentially unused, because R parses
> numbers like "1" as being floating point, not integer; the only way to
> get an integer value is to explicitly cast to it. Each of these types
> has a specific bit-pattern set aside for representing NA. And...
> that's it. It's very simple when it works, but also very limited.
I also suspected R to be less powerful in terms of types.
However, I think the fact that "It's very simple when it works" is
important to take into account. At the end of the day, when using all
the fanciness it is not only about "can I have some NAs in my array ?"
but also "how *easily* can I have some NAs in my array ?". It's about
balancing the "how easy" and the "how powerful".
The easyness-of-use is the reason of my concern about having separate
types "nafloatNN" and "floatNN". Of course, I won't argue that "not
breaking everything" is even more important !!
Coming back to Travis proposition "bit-pattern approaches to missing
data (*at least* for float64 and int32) need to be implemented.", I
wonder what is the amount of extra work to go from nafloat64 to
nafloat32/16 ? Is there an hardware support NaN payloads with these
smaller floats ? If not, or if it is too complicated, I feel it is
acceptable to say "it's too complicated" and fall back to mask. One may
have to choose between fancy types and fancy NAs...
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 900 bytes
Desc: OpenPGP digital signature
Url : http://mail.scipy.org/pipermail/numpy-discussion/attachments/20120307/abd0dc35/attachment.bin
More information about the NumPy-Discussion