[Numpy-discussion] using NaN, INT_MIN etc in ndarray instead of a masked array

Travis Oliphant oliphant.travis at ieee.org
Tue Apr 18 09:39:11 CDT 2006


Sasha wrote:
> On 4/18/06, Travis Oliphant <oliphant.travis at ieee.org> wrote:
>   
>> Michael Sorich wrote:
>> ...
>>     
>>> Is it possible to implement masked values using these special bit
>>> patterns in the ndarray instead of using a separate MA class? If so
>>> has there been any thought as to whether this may be the better
>>> option. I think it would be preferable if the ability to handle masked
>>> data was available in the standard array class (ndarray), as this
>>> would increase the likelihood that functions built for numeric arrays
>>> will handle masked values well. It seems that ndarray already has
>>> decent support for nans (isnan() returns the equivalent of a boolean
>>> mask array), indicating that such an approach may be acceptable. How
>>> difficult is it to generalise the concept to other data types (int,
>>> string, bool)?
>>>
>>>       
>> I don't think the approach can be generalized at all.   It would only
>> work with floating-point values and therefore is not particularly exciting.
>>
>>     
> Not true. R supports "NA" for all its types except raw bytes.
> For example:
>
>   
>> x<-logical(5)
>> x
>>     
> [1] FALSE FALSE FALSE FALSE FALSE
>   
>> x[1:2]=NA
>> !x
>>     
> [1]   NA   NA TRUE TRUE TRUE
>   
For Boolean values there is "room" for a NA value, but what about 
arbitrary integers.  Does R just limit the range of the integer value?  
That's what I meant:  "fiddling with special-values" doesn't generalize 
to all data-types.


>> arrays through other functions if it is possible.
>>
>>     
> I've voiced my opposition to subclassing before. 
And you haven't been very clear about why you are opposed.    Just 
voicing concern is not enough.   Python sub-classing in C amounts to 
exactly what masked arrays are:  arrays with additional components in 
their structure (i.e. a mask).    Please be more specific about whatever 
your concerns are with sub-classing.

>  Here I believe it is
> more appropriate to have an add-on module that installs alternative
> math functions. 
Sure that will work.   But, we're talking about more than math 
functions.  Ultimately masked array users will want *every* function 
they use to work "right" with masked arrays.  

> Having two classes in the same application that a
> subtly different in the corner cases is already a problem with
> ma.array vs. ndarray, adding the third class will only make things
> worse.
>   
I don't know what you are talking about.  What is the "third class?"  
I'm talking about just making ma.array construct a sub-class..
>> It seems that masked arrays must do things quite differently than other
>> arrays on certain applications, and I'm not altogether clear on how to
>> support them in all the NumPy code.  Because masked arrays are not used
>> by everybody who uses NumPy arrays, it should be a separate sub-class.
>>
>>     
> As far as I understand, people who don't use MA don't deal with
> missing values. For this category of users there will be no visible
> effect no matter how missing values are treated as long as in the
> absence of missing values, normal rules apply. Yes, many functions
> must treat missing values differently, but the same is true for NaNs. 
> NumPy allows floating point arrays to have nans, but there is no real
> support beyong what happened to work at the OS level.
>   

Or we deal with missing values differently (i.e. manage it 
ourselves).    Sure, there will be no behavioral effect, but the code 
will have to be re-written to "do the right thing" with masked arrays in 
such a way as to not slow everything else down (that's at least an "if" 
statement sprinkled throughout every sub-routine).  

Many people are not enthused about complicating the basic array object 
any more than necessary.   If it can be shown that masked arrays can be 
integrated into the ndarray object without inordinate complication 
and/or slowness, then I don't think people would mind.  

The best way to prove that is to create a sub-class and change only the 
methods / functions that are necessary.      That's really all I'm saying.

>   
>> Ultimately, I hope we will get the basic array object into Python (what
>> Tim was calling the super array) before 2.6
>>     
>
> As far as I understand, that object will not come with arithmetic
> rules or math functions.  Therefore, I don't see how this is relevant
> to the present discussion.
>   

Because it will help all array objects talk more cleanly to each other.  
But, if you are so opposed to sub-classing (which I'm not sure why in 
this case), then it may not matter.

-Travis







More information about the Numpy-discussion mailing list