[Numpy-discussion] Concepts for masked/missing data
Nathaniel Smith
njs@pobox....
Sat Jun 25 13:57:59 CDT 2011
On Sat, Jun 25, 2011 at 11:50 AM, Eric Firing <efiring@hawaii.edu> wrote:
> On 06/25/2011 07:05 AM, Nathaniel Smith wrote:
>> On Sat, Jun 25, 2011 at 9:26 AM, Matthew Brett<matthew.brett@gmail.com> wrote:
>>> To clarify, you're proposing for:
>>>
>>> a = np.sum(np.array([np.NA, np.NA])
>>>
>>> 1) -> np.NA
>>> 2) -> 0.0
>>
>> Yes -- and in R you get actually do get NA, while in numpy.ma you
>> actually do get 0. I don't think this is a coincidence; I think it's
>
> No, you don't:
>
> In [2]: np.ma.array([2, 4], mask=[True, True]).sum()
> Out[2]: masked
>
> In [4]: np.sum(np.ma.array([2, 4], mask=[True, True]))
> Out[4]: masked
Huh. So in numpy.ma, sum([10, NA]) and sum([10]) are the same, but
sum([NA]) and sum([]) are different? Sounds to me like you should file
a bug on numpy.ma...
Anyway, the general point is that in R, NA's propagate, and in
numpy.ma, masked values are ignored (except, apparently, if all values
are masked). Here, I actually checked these:
Python: np.ma.array([2, 4], mask=[True, False]).sum() -> 4
R: sum(c(NA, 4)) -> NA
-- Nathaniel
More information about the NumPy-Discussion
mailing list