[Numpy-discussion] Advice on masked array implementation

Pierre GM pgmdevlist@gmail....
Mon Feb 26 14:42:58 CST 2007


On Monday 26 February 2007 14:51:42 Fred Clare wrote:
> We would like some advice on how to proceed with implementing
> masked array capabilities in a large package of climate-related
> analysis functions. 

Sounds great ! I'm working on the same field basically, and I needed 
MaskedArrays to deal with missing values in environmental series. But we can 
chat about that off-list.

> If one is a Numeric masked array and the other
> is a NumPy masked array, then we return a NumPy masked array.
> Similar checking is done for using just Numeric and/or NumPy
> non-masked arrays.  Does this seem like a reasonable approach?

> We have followed the discussion on the development of the new
> maskedarray module, but have not used it.  I went to
>
> http://projects.scipy.org/scipy/numpy/attachment/wiki/MaskedArray/
> maskedarray.py
>
> as referenced in a posting from Pierre GM, but I got "Internal Error."

Yes, I had to take the package off the projects.scipy.org site when I got 
write access to the svn server, as that particular version was really 
outdated. You can find the latest version on the scipy svn server, in the 
sandbox:
http://svn.scipy.org/svn/scipy/trunk/Lib/sandbox/maskedarray/

Note that I made some major updates a couple of weeks ago, without advertising 
them. A description is available at:
http://svn.scipy.org/svn/scipy/trunk/Lib/sandbox/maskedarray/CHANGELOG

> Has there been any decision as to whether maskedarray will be
> in NumPy version 1.1?  

The decision is out of my hands. My understanding is that before the new 
implementation can be taken seriously, more feedback is needed from actual 
users (and I fully agree with that). Moreover, there are some vague plans 
about porting it to C. My naive initial attempts with Pyrex having failed 
dramatically, I will have to learn C, so it probably won't happen in the very 
next weeks... But porting to C should solve some minor issues I'm unhappy 
with for now, and can't implement in python without significantly degrading 
the performances.

> Any estimate as to when 1.1 would be out? 
> If we commit to numpy.core.ma now, how much trouble will it be
> to convert to the new maskedarray?  

It shouldn't be that a problem. Normally, the following should work (even if 
some warnings are raised)
>>> import numpy.core.ma as ma
>>> import maskedarray as MA
>>> x = ma.array([1,2,3,4,5], mask=[1,0,1,0,0])
>>> x
array(data =
 [999999      2      3 999999      5],
      mask =
 [ True False False  True False],
      fill_value=999999)
>>> X = MA.array(x)
>>> X
masked_array(data = [-- 2 3 -- 5],
      mask = [ True False False  True False],
      fill_value=999999)

That is, maskedarray.MaskedArrays recognize numpy.core.ma.maskedarray

I tried to keep as much backward compatibility as I could, but without really 
testing it, so no guarantee.

> Is there any user documentation 
> on maskedarray and details on the differences between it and
> numpy.core.ma?

Not at this point, unfortunately. Note that the "new" implementation follows 
very closely Paul Dubois' initial code. (In fact, a bit too closely for its 
own good. Reggie Dugard suggested some modifications I tried to take into 
account in the latest version that seem to solve that). Therefore, switching 
from numpy.core.ma to maskedarray should be relatively painless. I'd be more 
than happy to help you on that.

Basically the main differences between the two implementations are:
- MaskedArray are regular subclasses of ndarray, so you can use asanyarray 
without losing your mask.
- Subclassing MaskedArray is far easier with the new implementation than it 
was with numpy.core.ma
- the fill_value attribute is now a property
- the _data attribute is now a view of the MaskedArray, instead of an 
independent object.
- the underlying _data can be any subclass of ndarray (such as matrix)
- some of the MaskedArray methods (ravel, transpose...) are implemented 
through wrappers that must have a __get__ method. That works well w/ 
Python2.4, I'm not sure it would work w/ 2.3.
- some methods that were not available in numpy.core.ma are now in 
maskedarray, either .core or .extras
- there's a prototype of MaskedRecords objects, that gives the possibility to 
mask specific fields in a recarray.

All in all, I think that the new implementation gets rid of some of the 
limitations of numpy.core.ma, without affecting too badly performances. The 
latest test showed that yes, maskedarray is slightly slower than 
numpy.core.ma (10%), but it provides more functionality: for example, you can 
prevent a mask to be overwritten, it is very easy to subclass, it interacts 
nicely with ndarray... I'm using the new implementation systematically for my 
own projects (which explains why there are regularly some tweakings to the 
implementation), and Matt Knox and I have been using it for our common 
TimeSeries project without any difficulty so far. 

Once again, please do not hesitate to contact me on or off-list if you have 
any questions/comments/requests.




More information about the Numpy-discussion mailing list