[Numpy-discussion] feedback request: proposal to add masks to the core ndarray

Mark Wiebe mwwiebe@gmail....
Fri Jun 24 11:21:01 CDT 2011

On Thu, Jun 23, 2011 at 8:00 PM, Pierre GM <pgmdevlist@gmail.com> wrote:

> On Jun 24, 2011, at 2:42 AM, Mark Wiebe wrote:
> > On Thu, Jun 23, 2011 at 7:28 PM, Pierre GM <pgmdevlist@gmail.com> wrote:
> > Sorry y'all, I'm just commenting bits by bits:
> >
> > "One key problem is a lack of orthogonality with other features, for
> instance creating a masked array with physical quantities can't be done
> because both are separate subclasses of ndarray. The only reasonable way to
> deal with this is to move the mask into the core ndarray."
> >
> > Meh. I did try to make it easy to use masked arrays on top of subclasses.
> There's even some tests in the suite to that effect (test_subclassing). I'm
> not buying the argument.
> > About moving mask in the core ndarray: I had suggested back in the days
> to have a mask flag/property built-in ndarrays (which would *really* have
> simplified the game), but this suggestion  was dismissed very quickly as
> adding too much overload. I had to agree. I'm just a tad surprised the wind
> has changed on that matter.
> >
> > Ok, I'll have to change that section then. :)
> >
> > I don't remember seeing mention of this ability in the documentation, but
> I may not have been reading closely enough for that part.
> Or played with it ;)

True, I haven't played with it all that much, but the amount I've used it
and the amount I've wrestled with it during 1.6 development certainly make
me feel I know something about it. ;)

> > "In the current masked array, calculations are done for the whole array,
> then masks are patched up afterwords. This means that invalid calculations
> sitting in masked elements can raise warnings or exceptions even though they
> shouldn't, so the ufunc error handling mechanism can't be relied on."
> >
> > Well, there's a reason for that. Initially, I tried to guess what the
> mask of the output should be from the mask of the inputs, the objective
> being to avoid getting NaNs in the C array. That was easy in most cases,
>  but it turned out it wasn't always possible (the `power` one caused me a
> lot of issues, if I recall correctly). So, for performance issues (to avoid
> a lot of expensive tests), I fell back on the old concept of "compute them
> all, they'll be sorted afterwards".
> > Of course, that's rather clumsy an approach. But it works not too badly
> when in pure Python. No doubt that a proper C implementation would work
> faster.
> > Oh, about using NaNs for invalid data ? Well, can't work with integers.
> >
> > In my proposal, NaNs stay as unmasked NaN values, instead of turning into
> masked values. This is necessary for uniform treatment of all dtypes, but a
> subclass could override this behavior with an extra mask modification after
> arithmetic operations.
> No problem with that...
> > `mask` property:
> > Nothing to add to it. It's basically what we have now (except for the
> opposite convention).
> >
> > Working with masked values:
> > I recall some strong points back in the days for not using None to
> represent missing values...
> > Adding a maskedstr argument to array2string ? Mmh... I prefer a global
> flag like we have now.
> >
> > I'm not really a fan of all the global state that NumPy keeps, I guess
> I'm trying to stamp that out bit by bit as well where I can...
> Pretty convenient to define a default once for all, though.

Maybe it needs to go in both places.

> > Design questions:
> > Adding `masked` or whatever we call it to a number/array should result is
> masked/a fully masked array, period. That way, we can have an idea that
> something was wrong with the initial dataset.
> >
> > I'm not sure I understand what you mean, in the design adding a mask
> means setting "a.mask = True", "a.mask = False", or "a.mask = <boolean
> array>" in general.
> I mean that:
> 0 + ma.masked = ma.masked
> ma.array([1,2,3], mask=False) + ma.masked = ma.array([1,2,3],
> mask=[True,True,True])
> By extension, any operation involving a masked value should result in a
> masked value.

R appears to consistently follow the model Nathaniel pointed out, and
adopting the same one seems like a good idea to me. With the model in place,
the desired result of these operations follows fairly naturally.

> > hardmask: I never used the feature myself. I wonder if anyone did. Still,
> it's a nice idea...
> >
> > Ok, I'll leave that out of the initial design unless someone comes up
> with some strong use cases.
> Oh, it doesn't eat bread (as we say in French), so you can leave it where
> it is...

Yeah, numpy.ma isn't going to disappear in a puff of smoke.


> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20110624/ce79f1c2/attachment.html 

More information about the NumPy-Discussion mailing list