A reimplementation of MaskedArray
Pierre GM
pgmdevlist at gmail.com
Wed Nov 8 04:10:13 CST 2006
Michael,
First of all, thanks for your interest in the exercise of style the new
implementation of MaskedArray is basically nothing but.
On Tuesday 07 November 2006 20:11, Michael Sorich wrote:
> 1. It would be nice if the masked_singleton could be passed into a
> ndarray, as this would allow it to be passed into the MaskedArray e.g.
>
> import numpy as N
> import ma.maskedarray as MA
> test = N.array([1,2,MA.masked])
>
> >> ValueError: setting an array element with a sequence
I like your idea, but not its implementation. If MA.masked_singleton is
defined as an object, as you suggest, then the dtype of the ndarray it is
passed to becomes 'object', as you pointed out, and that is not something one
would naturally expec, as basic numerical functions don't work well with the
'object' dtype (just try N.sqrt(N.array([1],dtype=N.object)) to see what I
mean).
Even if we can construct a mask rather easily at the creation of the masked
array, following your 'a==masked' suggestion, we still need to get the dtype
of the non-masked section, and that doesn't seem trivial...
I guess that a simple solution is to use MA.masked_values.
Make sure to use a numerical value for the masked data, else you'll end up
with yet another object array.
> 2. What happens if a masked array is passed into a ndarray or passed
> into a MaskedArray with another mask?
>>> test_ma1 = MA.array([1,2,3], mask=[False, False, True])
>>> print test_ma1, N.array(test_ma1),
[1 2 --] [1 2 3]
>>> MA.array(test_ma1, mask=[True, False, False])
[-- 2 3]
Let me precise that my objective was to get an implementation as close to the
original numpy.core.ma as possible, for 'backward compatibility'. I'm not
sure it'd be wise to change it at this point, but that could be discussed.
As you've noticed, when creating a new masked array from an existing one, the
'mask' argument supersedes the initial mask. That's ideal when you want to
focus on a fraction of the initial data: you just mask what you don't need,
and are still able to retrieve it when you need it. I agree that this default
behavior is a bit strange when you have missing data: in that case, one would
expect the new mask to be a combination of the 'mask' argument and the old
mask.
A possibility would then be to add a 'keep_mask' flag: a default of False
would give the current behavior, a value of True would force the new mask to
be a combination. I think that feelings are mixed on that list about extra
flags, but the users of maskedarray are only a minority anyway (hopefully,
only for the moment).
About the conversion to ndarray:
By default, the result should have the same dtype as the _data section.
For this reason, I disagree with your idea of "(returning) an object ndarray
with the missing value containing the masked singleton". If you really want
an object ndarray, you can use the filled method or the filled function, with
your own definition of the filling value (such as your MaskedScalar).
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
More information about the Numpy-discussion
mailing list