[Numpy-discussion] feedback request: proposal to add masks to the core ndarray

Mark Wiebe mwwiebe@gmail....
Sat Jun 25 15:14:39 CDT 2011


On Sat, Jun 25, 2011 at 6:29 AM, Pierre GM <pgmdevlist@gmail.com> wrote:

> This thread is getting quite long, innit ?
>

It's tiring, yeah!


> And I think it's getting a tad confusing, because we're mixing two
> different concepts: missing values and masks.
> There should be support for missing values in numpy.core, I think we all
> agree on that.
> * What's been suggested of adding new dtypes (nafloat, naint) is great, by
> why not making it the default, then ?
>

I don't like it as the default because of the lack of generality, but I've
come up with an idea that's in the NEP where the masked approach would be
the default, and a parameterized type would support the NA bit pattern idea
in a general fashion.


> * Operations involving a NA (whatever the NA actually is, depending on the
> dtype of the input) should result in a NA (whatever the NA defined by the
> outputs dtype). That could be done by overloading the existing ufuncs to
> support the new dtypes.
>

I don't want to add hundreds of new inner loops for many different dtypes. I
added the float16 dtype, and doing that another 24 times to produce
NA-versions of the same times would be unpleasant to say the least. I think
I've got a grasp on an approach that builds on top of the existing inner
loops (or slightly refactored versions) to get all these ideas working
together.


> * There should be some simple methods to retrieve the location of those NAs
> in an array. Whether we just output the indices or a full boolean array (w/
> True for a NA, False for a non-NA or vice-versa) needs to be decided.
>

Maybe we need two methods. np.isna or np.ismissing, and one of
np.isvalid/np.isthere/np.isavail to get access to the mask the way people
want without extra copies.


> * We can always re-implement masked arrays to use these NAs in a way which
> would be consistent with numpy.ma (so as not to confuse existing users of
> numpy.ma): a mask would be a boolean array with the same shape than the
> underlying ndarray, with True for NA.
>

That would be doable, yes.


> Mark, I'd suggest you modify your proposal, making it clearer that it's not
> to add all of numpy.ma functionalities in the core, but just support these
> missing values. Using the term 'mask' should be avoided as much as possible,
> use a 'missing data' or whatever.
>

If the implementation is in terms of a mask, I think the term 'mask' should
still be used where it's relevant. Maybe there is no 'mask' or
'validitymask' attribute as I've proposed, and instead
np.ismissing/np.isavail are the only interface for getting at the mask. I
would still want arr.flags.hasmask and arr.flags.ownmask to be there.

-Mark


> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20110625/3dbae29c/attachment.html 


More information about the NumPy-Discussion mailing list