[Numpy-discussion] feedback request: proposal to add masks to the core ndarray

Pierre GM pgmdevlist@gmail....
Thu Jun 23 19:28:45 CDT 2011


Sorry y'all, I'm just commenting bits by bits:

"One key problem is a lack of orthogonality with other features, for instance creating a masked array with physical quantities can't be done because both are separate subclasses of ndarray. The only reasonable way to deal with this is to move the mask into the core ndarray."

Meh. I did try to make it easy to use masked arrays on top of subclasses. There's even some tests in the suite to that effect (test_subclassing). I'm not buying the argument.
About moving mask in the core ndarray: I had suggested back in the days to have a mask flag/property built-in ndarrays (which would *really* have simplified the game), but this suggestion  was dismissed very quickly as adding too much overload. I had to agree. I'm just a tad surprised the wind has changed on that matter.


"In the current masked array, calculations are done for the whole array, then masks are patched up afterwords. This means that invalid calculations sitting in masked elements can raise warnings or exceptions even though they shouldn't, so the ufunc error handling mechanism can't be relied on."

Well, there's a reason for that. Initially, I tried to guess what the mask of the output should be from the mask of the inputs, the objective being to avoid getting NaNs in the C array. That was easy in most cases,  but it turned out it wasn't always possible (the `power` one caused me a lot of issues, if I recall correctly). So, for performance issues (to avoid a lot of expensive tests), I fell back on the old concept of "compute them all, they'll be sorted afterwards".
Of course, that's rather clumsy an approach. But it works not too badly when in pure Python. No doubt that a proper C implementation would work faster.
Oh, about using NaNs for invalid data ? Well, can't work with integers.

`mask` property:
Nothing to add to it. It's basically what we have now (except for the opposite convention).

Working with masked values:
I recall some strong points back in the days for not using None to represent missing values...
Adding a maskedstr argument to array2string ? Mmh... I prefer a global flag like we have now.

Design questions:
Adding `masked` or whatever we call it to a number/array should result is masked/a fully masked array, period. That way, we can have an idea that something was wrong with the initial dataset.
hardmask: I never used the feature myself. I wonder if anyone did. Still, it's a nice idea...


More information about the NumPy-Discussion mailing list