[Numpy-discussion] min() of array containing NaN

Andrew Dalke dalke@dalkescientific....
Thu Aug 14 14:07:20 CDT 2008

Anne Archibald:
 > Sadly, it's not possible without extra overhead. Specifically: the
 > NaN-ignorant implementation does a single comparison between each
 > array element and a placeholder, and decides based on the result  
 > to keep.

Did my example code go through?  The test for NaN only needs to be
done when a new min value is found, which will occur something like
O(log(n)) in a randomly distributed array.

(Here's the hand-waving.  The first requires a NaN check.  The
second has a 1/2 chance of being the new minimum.  The third has
a 1/3 chance, etc.  The sum of the harmonic series goes as O(ln(n)).)

This depends on a double inverting so the test for a new min value and
a test for NaN occur at the same time.  Here's pseudocode:

best = array[0]
if isnan(best):
   return best
for item in array[1:]:
   if !(best <= item):
     best = item
     if isnan(best):
       return best
  return item

 > If you're willing to do two tests, sure, but that's overhead (and
 > probably comparable to an isnan).

In Python the extra inversion costs an extra PVM instruction.  In C
by comparison the resulting assembly code for "best > item" and
"!(best <= item)" have identical lengths, with no real performance

There's no extra cost for doing the extra inversion in the common
case, and for large arrays the ratio of (NaN check) / (no check) -> 1.0.

 > What do compilers' min builtins do with NaNs? This might well be
 > faster than an if statement even in the absence of NaNs...

This comes from a g++ implementation of min:

    *  @brief This does what you think it does.
    *  @param  a  A thing of arbitrary type.
    *  @param  b  Another thing of arbitrary type.
    *  @return   The lesser of the parameters.
    *  This is the simple classic generic implementation.  It will  
work on
    *  temporary expressions, since they are only evaluated once,  
unlike a
    *  preprocessor macro.
   template<typename _Tp>
     inline const _Tp&
     min(const _Tp& __a, const _Tp& __b)
       // concept requirements
       //return __b < __a ? __b : __a;
       if (__b < __a)
         return __b;
       return __a;

The isnan function another version of gcc uses a bunch of
#defs, leading to

         static __inline__  int __inline_isnanf( float __x ) { return  
__x != __x; }
         static __inline__  int __inline_isnand( double __x )  
{ return __x != __x; }
         static __inline__  int __inline_isnan( long double __x )  
{ return __x != __x; }


More information about the Numpy-discussion mailing list