[Numpy-discussion] What is consensus anyway

Charles R Harris charlesr.harris@gmail....
Tue Apr 24 13:12:55 CDT 2012


On Tue, Apr 24, 2012 at 9:25 AM, <josef.pktd@gmail.com> wrote:

> On Tue, Apr 24, 2012 at 9:43 AM, Pierre Haessig
> <pierre.haessig@crans.org> wrote:
> > Hi,
> >
> > Le 24/04/2012 15:14, Charles R Harris a écrit :
> >>
> >> a) All arrays should be implicitly masked, even if the mask isn't
> >> initially allocated. The maskna keyword can then be removed, taking
> >> with it the sense that there are two kinds of arrays.
> >>
> >
> > From my lazy user perspective, having masked and non-masked arrays share
> > the same "look and feel" would be a number one advantage over the
> > existing numpy.ma arrays. I would like masked array to be as transparent
> > as possible.
>
> I don't have any opinion about internal implementation.
>
> But users needs to be aware of whether they have masked arrays or not.
> Since many functions (most of scipy) wouldn't know how to handle NA
> and don't do any checks, (and shouldn't in my opinion if the NA check
> is costly). The result might be silently wrong numbers depending on
> the implementation.
>

There should be a flag saying whether or not NA has been allocated and
allocation happens when NA is assigned to an array item, so that should be
fast. I don't think scipy currently deals with masked arrays in all areas,,
so I believe that the same problem exists there and would also exist for
missing data types. I think this sort of compatibility problem is worth a
whole discussion by itself.


>
> >
> >> b) There needs to be a distinction between missing and ignore. The
> >> mechanism for this is already in place in the payload type, although
> >> it isn't clear to me that that is uniformly used in all the NA code.
> >> There is also a place for missing *and* ignored. Which leads to
> >
> > If the idea of having two payloads is to avoid a maximum of "skipna &
> > friends" extra keywords, I would like it much. My feeling with my small
> > experience with R is that I end up calling every function with a
> > different magical set of keywords (na.rm, na.action, ... and I forgot).
>
> There is a reason for requiring the user to decide what to do about NA's.
> Either we have utility functions/methods to help the user change the
> arrays and treat NA's before calling a function, or the function needs
> to ask the user what should be done about possible NAs.
> Doing it automatically might only be useful for specialised packages.
>
>
That's what the different payloads would do. I think the common use case
would always have the ignore bit set. What are the other sorts of actions
you are interested in, and should they be part of the functions in Numpy,
such as mean and std, or should they rather implemented in stats packages
that may be more specialized? I see numpy.ma currently used in the
following spots in scipy:

scipy/stats/mstats_extras.py
scipy/stats/tests/test_mstats_extras.py
scipy/stats/tests/test_mstats_basic.py
scipy/stats/mstats_basic.py
scipy/signal/filter_design.py
scipy/optimize/optimize.py

The advantage of nans, I suppose, is that they are in the hardware and so
already universally part of Numpy. NA would be introduced, so would require
a bit more work. I expect it will be several (many) years before they are
dealt with as a matter of course. At minimum, one would need to check if
the masked flag is set.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20120424/1dde4d57/attachment.html 


More information about the NumPy-Discussion mailing list