[Numpy-discussion] Medians that ignore values

David Cournapeau david@ar.media.kyoto-u.ac...
Sat Sep 20 00:41:25 CDT 2008


Anne Archibald wrote:
>
> I, on the other hand, was making specifically that suggestion: users
> should not use nans to indicate missing values. Users should use
> masked arrays to indicate missing values.

I agree it is the nicest solution in theory, but I think it is
impractical (as mentioned by Eric Firing in his email).

>
> This part I pretty much agree with.

I can't really see which one is better (failing or returning NaN for
sort/min/max and their sort counterpat), or if we should let the choice
be left to the user. I am fine with both, and they both require the same
amount of work.

>  Or we can make them behave drastically differently.
> Masked arrays clearly need to be able to handle masked values flexibly
> and explicitly. So I think nans should be handled simply and
> conservatively: propagate them if possible, raise if not.

I agree about this behavior being the default. I just think that for a
couple of functions, we could we give either separate functions, or
additional arguments to existing functions to ignore them: I am thinking
about min/max/sort and their arg* counterpart, because those are really
basic, and because we already have nanmean/nanstd/nanmedian (e.g. having
a nansort would help for nanmean to be much faster).

>
> If users are concerned about performance, it's worth noting that on
> some machines nans force a fallback to software floating-point
> handling, with a corresponding very large performance hit.

I was more concerned with the cost of treating NaN when you do not have
NaN in your array when you have to treat for NaN explicitely (everything
involving comparison). But I don't see any obvious way to avoid that cost,

David


More information about the Numpy-discussion mailing list