[Numpy-discussion] What is consensus anyway
Tue Apr 24 13:35:17 CDT 2012
On Tue, Apr 24, 2012 at 2:12 PM, Charles R Harris <firstname.lastname@example.org
> On Tue, Apr 24, 2012 at 9:25 AM, <email@example.com> wrote:
>> On Tue, Apr 24, 2012 at 9:43 AM, Pierre Haessig
>> <firstname.lastname@example.org> wrote:
>> > Hi,
>> > Le 24/04/2012 15:14, Charles R Harris a écrit :
>> >> a) All arrays should be implicitly masked, even if the mask isn't
>> >> initially allocated. The maskna keyword can then be removed, taking
>> >> with it the sense that there are two kinds of arrays.
>> > From my lazy user perspective, having masked and non-masked arrays share
>> > the same "look and feel" would be a number one advantage over the
>> > existing numpy.ma arrays. I would like masked array to be as
>> > as possible.
>> I don't have any opinion about internal implementation.
>> But users needs to be aware of whether they have masked arrays or not.
>> Since many functions (most of scipy) wouldn't know how to handle NA
>> and don't do any checks, (and shouldn't in my opinion if the NA check
>> is costly). The result might be silently wrong numbers depending on
>> the implementation.
> There should be a flag saying whether or not NA has been allocated and
> allocation happens when NA is assigned to an array item, so that should be
> fast. I don't think scipy currently deals with masked arrays in all areas,,
> so I believe that the same problem exists there and would also exist for
> missing data types. I think this sort of compatibility problem is worth a
> whole discussion by itself.
>> >> b) There needs to be a distinction between missing and ignore. The
>> >> mechanism for this is already in place in the payload type, although
>> >> it isn't clear to me that that is uniformly used in all the NA code.
>> >> There is also a place for missing *and* ignored. Which leads to
>> > If the idea of having two payloads is to avoid a maximum of "skipna &
>> > friends" extra keywords, I would like it much. My feeling with my small
>> > experience with R is that I end up calling every function with a
>> > different magical set of keywords (na.rm, na.action, ... and I forgot).
>> There is a reason for requiring the user to decide what to do about NA's.
>> Either we have utility functions/methods to help the user change the
>> arrays and treat NA's before calling a function, or the function needs
>> to ask the user what should be done about possible NAs.
>> Doing it automatically might only be useful for specialised packages.
> That's what the different payloads would do. I think the common use case
> would always have the ignore bit set. What are the other sorts of actions
> you are interested in, and should they be part of the functions in Numpy,
> such as mean and std, or should they rather implemented in stats packages
> that may be more specialized? I see numpy.ma currently used in the
> following spots in scipy:
Like you said, this whole issue probably should be in a separate
discussion, but I would like to point out here with my thoughts on default
payload. If we don't have some sort of mechanism for flagging which
functions are NA-friendly or not, then it would be wise to have NA default
to NaN behavior. If only to prevent bugs that mess up data from being
That being said, the determination of NA payload is tricky. Some functions
may need to react differently to an NA. One that comes to mind is
np.gradient(). However, other functions may not need to do anything
because they depend entirely upon other functions that have already been
updated to support NA.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion