x.min() depends on ordering
Tim Hochberg
tim.hochberg at ieee.org
Sat Nov 11 17:46:47 CST 2006
Robert Kern wrote:
> Keith Goodman wrote:
>
>> x.min() and x.max() depend on the ordering of the elements:
>>
>>
>>>> x = M.matrix([[ M.nan, 2.0, 1.0]])
>>>> x.min()
>>>>
>> nan
>>
>>
>>>> x = M.matrix([[ 1.0, 2.0, M.nan]])
>>>> x.min()
>>>>
>> 1.0
>>
>> If I were to try the latter in ipython, I'd assume, great, min()
>> ignores NaNs. But then the former would be a bug in my program.
>>
>> Is this related to how sort works?
>>
>
> Not really. sort() is a more complicated algorithm that does a number of
> different comparisons in an order that is difficult to determine beforehand.
> x.min() should just be a straight pass through all of the elements. However, the
> core problem is the same: a < nan, a > nan, a == nan are all False for any a.
>
> Barring a clever solution (at least cleverer than I feel like being
> immediately), the way to solve this would be to check for nans in the array and
> deal with them separately (or simply ignore them in the case of x.min()).
> However, this checking would slow down the common case that has no nans (sans
> nans, if you will).
>
For ignoring NaNs, isn't is simply a matter of scanning through the
array till you find the first non NaN the proceeding as normal? In the
common case, this requires one extra compare (or rather is_nan) which
should be negligible in most circumstances. Only when you have an array
with a load of NaNs at the beginning would it be slow. One would have to
decide whether to return NaN or raise an error when there were no real
numbers.
My preference would be to raise an error / warning when there is a nan
in the array. Technically, there is no minimum value when a nan is
present. I believe that this would be feasible be swapping the compare
from 'a < b' to '!(a >= b)'. This should return NaN if any NaNs are
present and I suspect the extra '!' will have minimal performance impact
but it would have to be tested. Then a warning or error could be issued
on the way out depending on the erstate. Arguably returning NaN is more
correct than returning the smallest non NaN anyway.
As for Keith Goodmans request for a NaN ignoring min function, I suggest:
a[~np.isnan(a)].min()
Or better yet, stop generating so many NaN's.
-tim
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
More information about the Numpy-discussion
mailing list