[Numpy-discussion] min() of array containing NaN
Andrew Dalke
dalke@dalkescientific....
Wed Aug 13 15:18:05 CDT 2008
Robert Kern wrote:
> Or we could implement the inner loop of the minimum ufunc to return
> NaN if there is a NaN. Currently it just compares the two values
> (which causes the unpredictable results since having a NaN on either
> side of the < is always False). I would be amenable to that provided
> that the C isnan() call does not cause too much slowdown in the normal
> case.
Reading this again, I realize that I don't know how ufuncs work so
this suggestion might not be feasible. ....
It doesn't need to be unpredictable. Make sure the first value is
not a NaN (if it is, quit). The test against NaN always returns
false, so by inverting the comparison then inverting the result
you end up with a test for "is a new minimum OR is NaN". (I
checked the assembly output. There's no effective different
in code length between the normal and the inverted forms. I
didn't test performance.)
For random values in the array the test should pass less and
less often, so sticking the isnan test in there has something
like O(log(N)) cost instead of O(N) cost. That's handwaving,
btw, but it's probably a log because the effect is scale invariant.
Here's example code
#include <stdio.h>
#include <math.h>
double nan_min(int n, double *data) {
int i;
double best = data[0];
if (isnan(best)) {
return best;
}
for (i=1; i<n; i++) {
if (!(best <= data[i])) {
/* data[i] is either a new min or a NaN */
best = data[i];
if (isnan(best)) {
return best;
}
}
}
return best;
}
int main() {
double test[] = {1.0, 2.0, 3.0};
double nan = 0.0/0.0;
printf("1. %f\n", nan_min(sizeof(test)/sizeof(double), test));
test[1] = nan;
printf("2. %f\n", nan_min(sizeof(test)/sizeof(double), test));
test[1] = 2.0;
test[2] = nan;
printf("3. %f\n", nan_min(sizeof(test)/sizeof(double), test));
test[2] = 3.0;
printf("4. %f\n", nan_min(sizeof(test)/sizeof(double), test));
test[0] = nan;
printf("5. %f\n", nan_min(sizeof(test)/sizeof(double), test));
}
The output is
1. 1.000000
2. nan
3. nan
4. 1.000000
5. nan
Andrew
dalke@dalkescientific.com
More information about the Numpy-discussion
mailing list