[Numpy-discussion] Medians that ignore values

Anne Archibald peridot.faceted@gmail....
Fri Sep 19 03:10:24 CDT 2008


2008/9/19 Pierre GM <pgmdevlist@gmail.com>:
> On Friday 19 September 2008 03:11:05 David Cournapeau wrote:
>
>> Hm, I am always puzzled when I think about nan handling :) It always
>> seem there is not good answer.
>
> Which is why we have masked arrays, of course ;)

I think the numpy attitude to nans should be that they are unexpected
bogus values that signify that something went wrong with the
calculation somewhere. They can be left in place for most operations,
but any operation that depends on the value should (ideally) return
nan, or failing that, raise an exception. (If users want exceptions
all the time, that's what seterr is for.) If people want to flag bad
data, let's tell them to use masked arrays.

So by this rule amax/maximum/mean/median should all return nan when
there's a nan in their input; I don't think it's reasonable for sort
to return an array full of nans, so I think its default behaviour
should be to raise an exception if there's a nan. It's valuable (for
example in median) to be able to sort them all to the end, but I don't
think this should be the default. If people want nanmin, I would be
tempted to tell them to use masked arrays (is there a convenience
function that makes a masked array with a mask everywhere the data is
nan?).

I am assuming that appropriate masked sort/amax/maximum/mean/median
exist already. They're definitely needed, so how much effort is it
worth putting in to duplicate that functionality with nans instead of
masked elements?

Anne


More information about the Numpy-discussion mailing list