A reimplementation of MaskedArray

Pierre GM pgmdevlist at gmail.com
Wed Nov 8 04:10:13 CST 2006


Michael,
First of all, thanks for your interest in the exercise of style the new 
implementation of MaskedArray is basically nothing but.

On Tuesday 07 November 2006 20:11, Michael Sorich wrote:
> 1. It would be nice if the masked_singleton could be passed into a
> ndarray, as this would allow it to be passed into the MaskedArray e.g.
>
> import numpy as N
> import ma.maskedarray as MA
> test = N.array([1,2,MA.masked])
>
> >> ValueError: setting an array element with a sequence

I like your idea, but not its implementation. If MA.masked_singleton is 
defined as an object, as you suggest, then the dtype of the ndarray it is 
passed to becomes 'object', as you pointed out, and that is not something one 
would naturally expec, as basic numerical functions don't  work well  with the 
'object' dtype (just try  N.sqrt(N.array([1],dtype=N.object)) to see what I 
mean).
Even if we can construct a mask rather easily at the creation of the masked 
array, following your 'a==masked' suggestion, we still need to get the dtype 
of the non-masked section, and that doesn't seem trivial...

I guess that a simple solution is to use MA.masked_values. 
Make sure to use a numerical value for the masked data, else you'll end up 
with yet another object array.  


> 2. What happens if a masked array is passed into a ndarray or passed
> into a MaskedArray with another mask?
>>> test_ma1 = MA.array([1,2,3], mask=[False, False, True])
>>> print test_ma1, N.array(test_ma1), 
[1 2 --] [1 2 3]
>>> MA.array(test_ma1, mask=[True, False, False]) 
[-- 2 3]

Let me precise that my objective was to get an implementation as close to the 
original numpy.core.ma as possible, for 'backward compatibility'. I'm not 
sure it'd be wise to change it at this point, but that could be discussed.

As you've noticed, when creating a new masked array from an existing one, the 
'mask' argument supersedes the initial mask. That's ideal when you want to 
focus on a fraction of the initial data: you just mask what you don't need, 
and are still able to retrieve it when you need it. I agree that this default 
behavior is a bit strange when you have missing data: in that case, one would 
expect the new mask to be a combination of the 'mask' argument and the old 
mask. 
A possibility would then be to add a 'keep_mask' flag: a default of False 
would give the current behavior, a value of True would force the new mask to 
be a combination. I think that feelings are mixed on that list about extra 
flags, but the users of maskedarray are only a minority anyway (hopefully, 
only for the moment).

About the conversion to ndarray: 
By default, the result should have the same dtype as the _data section. 
For this reason, I disagree with your idea of "(returning) an object ndarray 
with the missing value containing the masked singleton". If you really want 
an object ndarray, you can use the filled method or the filled function, with 
your own definition of the filling value (such as your MaskedScalar). 







-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642




More information about the Numpy-discussion mailing list