[Numpy-discussion] Re: ndarray.fill and ma.array.filled

Tim Hochberg tim.hochberg at cox.net
Mon Apr 10 14:14:02 CDT 2006


Pierre GM wrote:

>>>[... longish example snipped ...]
>>>
>>>      
>>>
>>>>>ma.array([1,1], mask=[0,1]).sum()
>>>>>          
>>>>>
>>1
>>    
>>
>So ? The result is not `masked`, the missing value has been omitted.
>
>MA.array([[1,1],[1,1]],mask=[[0,1],[1,0]]).sum()
>array(data = [1 1],   mask = [False False], fill_value=999999)
>
>
>  
>
>>This is exactly the point of the current discussion: make fill a
>>method of ndarray.
>>    
>>
>Mrf. I'm still not convinced, but I have nothing against it. Along with a 
>mask=False_ by default ?
>
>  
>
>>With the current behavior, how would you achieve masking (no fill) a.sum()?
>>    
>>
>Er, why would I want to get MA.masked along one axis if one value is masked  ? 
>  
>
Any number of reasons I would think. It depends on what your using the 
data for. If the sum is the total amount that you spent in the month, 
and a masked value means you lost that check stub, then you don't know 
how much you actually spent and that value should be masked. To chose a 
boring example.

>The current behavior is to mask only if all the values along that axis are 
>masked:
>
>MA.array([[1,1],[1,1]],mask=[[0,1],[1,1]]).sum()
>array(data = [1 999999],   mask = [False True], fill_value=999999)
>
>With a.filled(0).sum(), how would you distinguish between the cases (a) at 
>least one value is not masked and (b) all values are masked  ? (OK, by 
>querying the mask with something in the line of a a._mask.all(axis), but it's 
>longer... Oh well, I'll just to adapt)
>  
>
Actually I'm going to ask you the same question. Why would care if all 
of the values are masked? I may be missing something, but either there's 
a sensible default value, in which case it doesn't matter how many 
values are masked, or you can't handle any masked values and the result 
should be masked if there are any masks in the input. Sasha's proposal 
handle those two cases well. Your behaviour a little more clunkily, but 
I'd like to understand why you want that behaviour.

Regards,

-tim

>  
>
>>>- this behavior was already in Numeric
>>>      
>>>
>>That's true, but it makes the result of sum(a) different from
>>__builtins__.sum(a).  I believe consistency with the python
>>conventions is more important than with legacy Numeric in the long
>>run.
>>
>>Array methods are a very recent addition to ma.  We can still use this
>>window of opportunity to get things right before to many people get
>>used to the wrong behavior.  (Note that I changed your implementation
>>of cumsum and cumprod.)
>>    
>>
>
>Good points... We'll just have to put strong warnings everywhere.
>
>  
>
>>>- The current way reflects how mask are used in GIS or image processing.
>>>      
>>>
>>Can you elaborate on this? Note that in R na.rm is false by default in sum:
>>    
>>
>>>sum(c(1,NA))
>>>      
>>>
>>[1] NA
>>
>>So it looks like the convention is different in the field of statistics.
>>    
>>
>
>MMh. *digs in his old GRASS scripts* 
>OK, my bad. I had to fill missing values somehow, or at least check whether 
>there were any before processing. I'll double check on that. Please 
>temporarily forget that comment.
>
>  
>
>>With the flag approach making ndarray and ma.array interfaces
>>consistent would require adding an extra argument to many methods.
>>Instead, I poropose to add one method: fill to ndarray.
>>    
>>
>OK, good point.
>
>
>On a semantic aspect:
>While digging these GRASS scripts I mentioned, I realized/remembered that 
>masked values are called 'null', when there's no data, a NAN, or just when 
>you want to hide some values. What about 'null' instead of 
>'mask','missing','na' ? 
>
>
>
>-------------------------------------------------------
>This SF.Net email is sponsored by xPML, a groundbreaking scripting language
>that extends applications into web and mobile media. Attend the live webcast
>and join the prime developer group breaking into this new coding territory!
>http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
>_______________________________________________
>Numpy-discussion mailing list
>Numpy-discussion at lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/numpy-discussion
>
>
>  
>






More information about the Numpy-discussion mailing list