[Numpy-discussion] Re: ndarray.fill and ma.array.filled

Sasha ndarray at mac.com
Mon Apr 10 11:37:00 CDT 2006


On 4/10/06, Pierre GM <pgmdevlist at mailcan.com> wrote:
> > If you sum along a particular dimension and encounter a masked value,
> > the result is masked.
>
> That's not how it currently works (still on 0.9.6):
>
> [... longish example snipped ...]

>>> ma.array([1,1], mask=[0,1]).sum()
1

> and frankly, I'd be quite frustrated if it had to change:
> - `filled` is not a ndarray method, which means that a.filled(0).sum() fails
> if a is not MA. Right now, I can use a.sum() without having to check the
> nature of a first.

This is exactly the point of the current discussion: make fill a
method of ndarray.
With the current behavior, how would you achieve masking (no fill) a.sum()?

> - this behavior was already in Numeric

That's true, but it makes the result of sum(a) different from
__builtins__.sum(a).  I believe consistency with the python
conventions is more important than with legacy Numeric in the long
run.

> [...]

> - The current way reflects how mask are used in GIS or image processing.
>
Can you elaborate on this? Note that in R na.rm is false by default in sum:

> sum(c(1,NA))
[1] NA

So it looks like the convention is different in the field of statistics.

> > If you would like to ignore masked values, you write
> > a.filled(0).sum() instead of a.sum(). In 1d case, you can also use
> > a.compress().sum().
>
> Once again, Sasha, I'd agree with you if it wasn't a major difference

Array methods are a very recent addition to ma.  We can still use this
window of opportunity to get things right before to many people get
used to the wrong behavior.  (Note that I changed your implementation
of cumsum and cumprod.)

>
> > In other words, what in R you achieve with a
> > flag, such as in sum(a, na.rm=TRUE), in numpy you achieve by an
> > explicit call to "fill".  This is not quite the same as na.actions in
> > R, but that is what I had in mind.
>
> I kinda like the idea of a flag, though

With the flag approach making ndarray and ma.array interfaces
consistent would require adding an extra argument to many methods. 
Instead, I poropose to add one method: fill to ndarray.




More information about the Numpy-discussion mailing list