[Numpy-discussion] A crazy masked-array thought
Charles R Harris
Fri Apr 27 10:54:44 CDT 2012
On Fri, Apr 27, 2012 at 9:16 AM, <email@example.com> wrote:
> On Fri, Apr 27, 2012 at 10:33 AM, Charles R Harris
> <firstname.lastname@example.org> wrote:
> > On Fri, Apr 27, 2012 at 8:15 AM, Charles R Harris
> > <email@example.com> wrote:
> >> On Wed, Apr 25, 2012 at 9:58 AM, Richard Hattersley
> >> <firstname.lastname@example.org> wrote:
> >>> The masked array discussions have brought up all sorts of interesting
> >>> topics - too many to usefully list here - but there's one aspect I
> >>> spotted yet. Perhaps that's because it's flat out wrong, or crazy, or
> >>> too awkward to be helpful. But ...
> >>> Shouldn't masked arrays (MA) be a superclass of the plain-old-array
> >>> (POA)?
> >>> In the library I'm working on, the introduction of MAs (via numpy.ma)
> >>> required us to sweep through the library and make a fair few changes.
> >>> not the sort of thing one would normally expect from the introduction
> of a
> >>> subclass.
> >>> Putting aside the ABI issue, would it help downstream API compatibility
> >>> if the POA was a subclass of the MA? Code that's expecting/casting-to
> a POA
> >>> might continue to work and, where appropriate, could be upgraded in
> >>> own time to accept MAs.
> >> That's a version of the idea that all arrays have masks, just some of
> >> have "missing" masks. That construction was mentioned in the thread but
> >> can see how one might have missed it. I think it is the right way to do
> >> things. However, current libraries and such will still need to do some
> >> in order to not do the wrong thing when a "real" mask was present. For
> >> instance, check and raise an error if they can't deal with it.
> > To expand a bit more, this is precisely why the current work on making
> > part of ndarray rather than a subclass was undertaken. There is a flag
> > says whether or not the array is masked, but you will still need to check
> > that flag to see if you are working with an unmasked instance of
> ndarray. At
> > the moment the masked version isn't quite completely fused with
> > ndarrays-classic since the maskedness needs to be specified in the
> > constructors and such, but what you suggest is actually what we are
> > towards.
> > No matter what is done, current functions and libraries that want to use
> > masks are going to have to deal with the existence of both masked and
> > unmasked arrays since the existence of a mask can't be ignored without
> > risking wrong results.
> (In case it's not the wrong thread)
> If every ndarray has this maskflag, then it is easy to adjust other
> library code.
That is the case.
In : ones(1).flags
C_CONTIGUOUS : True
F_CONTIGUOUS : True
OWNDATA : True
MASKNA : False
OWNMASKNA : False
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
What I'd like to add is that the mask is only allocated when NA (or
equivalent) is assigned. That way the flag also signals the actual presence
of a masked value.
> if myarr.maskflag is not None: raise SorryException
> What is expensive is having to do np.isnan(myarr) or
> np.isfinite(myarr) everywhere.
> As a concept I like the idea, masked arrays are the general class with
> generic defaults, "clean" arrays are a subclass where some methods are
> overwritten with faster implementations.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion