[Numpy-discussion] min() of array containing NaN

Anne Archibald peridot.faceted@gmail....
Tue Aug 12 03:06:33 CDT 2008

2008/8/12 Joe Harrington <jh@physics.ucf.edu>:

> So, I endorse extending min() and all other statistical routines to
> handle NaNs, possibly with a switch to turn it on if a suitably fast
> algorithm cannot be found (which is competitor IDL's solution).
> Certainly without a switch the default behavior should be to return
> NaN, not to return some random value, if a NaN is present.  Otherwise
> the user may never know a NaN is present, and therefore has to check
> every use for NaNs.  That constand manual NaN checking is slower and
> more error-prone than any numerical speed advantage.
> So to sum, proposed for statistical routnes:
> if NaN is not present, return value
> if NaN is present, return NaN
> if NaN is present and nan=True, return value ignoring all NaNs
> OR:
> if NaN is not present, return value
> if NaN is present, return value ignoring all NaNs
> if NaN is present and nan=True, return NaN
> I'd prefer the latter.  IDL does the former and it is a pain to do
> /nan all the time.  However, the latter might trip up the unwary,
> whereas the former never does.
> This would apply at least to:
> min
> max
> sum
> prod
> mean
> median
> std
> and possibly many others.

For almost all of these the current behaviour is to propagate NaNs
arithmetically. For example, the sum of anything with a NaN is NaN. I
think this is perfectly sufficient, given how easy it is to strip out
NaNs if that's what you want. The issue that started this thread (and
the many other threads that have come up as users stub their toes on
this behaviour) is that min (and other functions based on comparisons)
do not propagate NaNs. If you do np.amin(A) and A contains NaNs, you
can't count on getting a NaN back, unlike np.mean or np.std. the fact
that you get some random value not the minimum just adds insult to
injury. (It is probably also true that the value you get back depends
on how the array is stored in memory.)

It really isn't very hard to replace
if you want to ignore NaNs instead of propagating them. So I don't
feel a need for special code in sum() that treats NaN as 0. I would be
content if the comparison-based functions propagated NaNs

If you did decide it was essential to make versions of the functions
that removed NaNs, it would get you most of the way there to add an
optional keyword argument to ufuncs' reduce method that skipped NaNs.


More information about the Numpy-discussion mailing list