[Numpy-discussion] Re: ndarray.fill and ma.array.filled
pierregm at engr.uga.edu
Mon Apr 10 16:24:01 CDT 2006
> > So ? The result is not `masked`, the missing value has been omitted.
> I am just making your point with a shorter example.
OK, now I get it :)
> >Er, why would I want to get MA.masked along one axis if one value is
> > masked ?
> Any number of reasons I would think.
I understand that, and I eventually agree it should be the default.
> Because if you don't know one of the addends you don't know the sum.
Unless you want to discard some data on purpose.
> Replacing missing values with zeros is not always the right strategy.
> If you know that your data has non-zero mean, for example, you might
> want to replace missing values with the mean instead of zero.
Hence the need to get rid of filled_values
> Actually I'm going to ask you the same question. Why would care if all
> of the values are masked?
> > MA.array([[1,1],[1,1]],mask=[[0,1],[1,1]]).sum()
> > array(data = [1 999999], mask = [False True], fill_value=999999)
> I did not realize that, but it is really bad. What is the
> justification for this?
Masked values are not necessarily nans or missing. I quite regularly mask
values that do not satisfy a given condition. For various reasons, I can't
compress the array, I need to preserve its shape.
With the current behavior, a.sum() gives me the sum of the values that satisfy
the condition. If there's no such value, the result is masked, and that way I
know that the condition was never met. Here, I could use Sasha's method
combined with a._mask.all, no problem
Another example: let x a 2D array with missing values, to be normalized along
one axis. Currently, x/x.sum() give the result I want (provided it's true
division). Sasha's method would give me a completely masked array.
> > Good points... We'll just have to put strong warnings everywhere.
> Do you agree with my proposal as long as we have explicit warnings in
> the documentation that methods behave differently from legacy
Your points are quite valid. I'm just worried it's gonna break a lot of things
in the next future. And where do we stop ? So, if we follow Sasha's way:
x.prod() should be the same, right ? What about a.min(), a.max() ? a.mean() ?
More information about the Numpy-discussion