[Numpy-discussion] Advice on masked array implementation
Pierre GM
pgmdevlist@gmail....
Mon Feb 26 14:42:58 CST 2007
On Monday 26 February 2007 14:51:42 Fred Clare wrote:
> We would like some advice on how to proceed with implementing
> masked array capabilities in a large package of climate-related
> analysis functions.
Sounds great ! I'm working on the same field basically, and I needed
MaskedArrays to deal with missing values in environmental series. But we can
chat about that off-list.
> If one is a Numeric masked array and the other
> is a NumPy masked array, then we return a NumPy masked array.
> Similar checking is done for using just Numeric and/or NumPy
> non-masked arrays. Does this seem like a reasonable approach?
> We have followed the discussion on the development of the new
> maskedarray module, but have not used it. I went to
>
> http://projects.scipy.org/scipy/numpy/attachment/wiki/MaskedArray/
> maskedarray.py
>
> as referenced in a posting from Pierre GM, but I got "Internal Error."
Yes, I had to take the package off the projects.scipy.org site when I got
write access to the svn server, as that particular version was really
outdated. You can find the latest version on the scipy svn server, in the
sandbox:
http://svn.scipy.org/svn/scipy/trunk/Lib/sandbox/maskedarray/
Note that I made some major updates a couple of weeks ago, without advertising
them. A description is available at:
http://svn.scipy.org/svn/scipy/trunk/Lib/sandbox/maskedarray/CHANGELOG
> Has there been any decision as to whether maskedarray will be
> in NumPy version 1.1?
The decision is out of my hands. My understanding is that before the new
implementation can be taken seriously, more feedback is needed from actual
users (and I fully agree with that). Moreover, there are some vague plans
about porting it to C. My naive initial attempts with Pyrex having failed
dramatically, I will have to learn C, so it probably won't happen in the very
next weeks... But porting to C should solve some minor issues I'm unhappy
with for now, and can't implement in python without significantly degrading
the performances.
> Any estimate as to when 1.1 would be out?
> If we commit to numpy.core.ma now, how much trouble will it be
> to convert to the new maskedarray?
It shouldn't be that a problem. Normally, the following should work (even if
some warnings are raised)
>>> import numpy.core.ma as ma
>>> import maskedarray as MA
>>> x = ma.array([1,2,3,4,5], mask=[1,0,1,0,0])
>>> x
array(data =
[999999 2 3 999999 5],
mask =
[ True False False True False],
fill_value=999999)
>>> X = MA.array(x)
>>> X
masked_array(data = [-- 2 3 -- 5],
mask = [ True False False True False],
fill_value=999999)
That is, maskedarray.MaskedArrays recognize numpy.core.ma.maskedarray
I tried to keep as much backward compatibility as I could, but without really
testing it, so no guarantee.
> Is there any user documentation
> on maskedarray and details on the differences between it and
> numpy.core.ma?
Not at this point, unfortunately. Note that the "new" implementation follows
very closely Paul Dubois' initial code. (In fact, a bit too closely for its
own good. Reggie Dugard suggested some modifications I tried to take into
account in the latest version that seem to solve that). Therefore, switching
from numpy.core.ma to maskedarray should be relatively painless. I'd be more
than happy to help you on that.
Basically the main differences between the two implementations are:
- MaskedArray are regular subclasses of ndarray, so you can use asanyarray
without losing your mask.
- Subclassing MaskedArray is far easier with the new implementation than it
was with numpy.core.ma
- the fill_value attribute is now a property
- the _data attribute is now a view of the MaskedArray, instead of an
independent object.
- the underlying _data can be any subclass of ndarray (such as matrix)
- some of the MaskedArray methods (ravel, transpose...) are implemented
through wrappers that must have a __get__ method. That works well w/
Python2.4, I'm not sure it would work w/ 2.3.
- some methods that were not available in numpy.core.ma are now in
maskedarray, either .core or .extras
- there's a prototype of MaskedRecords objects, that gives the possibility to
mask specific fields in a recarray.
All in all, I think that the new implementation gets rid of some of the
limitations of numpy.core.ma, without affecting too badly performances. The
latest test showed that yes, maskedarray is slightly slower than
numpy.core.ma (10%), but it provides more functionality: for example, you can
prevent a mask to be overwritten, it is very easy to subclass, it interacts
nicely with ndarray... I'm using the new implementation systematically for my
own projects (which explains why there are regularly some tweakings to the
implementation), and Matt Knox and I have been using it for our common
TimeSeries project without any difficulty so far.
Once again, please do not hesitate to contact me on or off-list if you have
any questions/comments/requests.
More information about the Numpy-discussion
mailing list