[Numpy-discussion] sorting -inf, nan, inf
oliphant at ee.byu.edu
Tue Sep 19 16:48:18 CDT 2006
Tim Hochberg wrote:
>A. M. Archibald wrote:
>>On 19/09/06, Tim Hochberg <tim.hochberg at ieee.org> wrote:
>>>I'm not sure where the breakpoint is, but I was seeing failures for all
>>>three sort types with N as high as 10000. I suspect that they're all
>>>broken in the presence of NaNs. I further suspect you'd need some
>>>punishingly slow n**2 algorithm to be robust in the presence of NaNs.
>>Not at all. Just temporarily make NaNs compare greater than any other
>>floating-point value and use quicksort (say). You can even do this for
>>python lists without much trouble.
>I misspoke. What I meant here was keeping the behavior that people think
>that we already have but don't: NaNs stay in place and everything is
>sorted around them. And even that's not true, since you could just
>record where the NaNs are, remove them, sort and put them back. What I
>was really getting at was, that I'm guessing, and it's just a guess,
>that (a) none of the fast sorting algorithms do anything sensible unless
>special cased and (b) one could come up with a naive n**2 sort that does
>do something sensible without special casing (where sensible means leave
>the NaNs alone).
>>That's actually a viable suggestion for numpy's sorting, although it
>>would be kind of ugly to implement: do a quick any(isnan(A)), and if
>>not, use the fast stock sorting algorithms; if there is a NaN
>>somewhere in the array, use a version of the sort that has a tweaked
>>comparison function so the NaNs wind up at the end and are easy to
>>But the current situation, silently returning arrays in which the
>>non-NaNs are unsorted, is really bad.
>If your going to do isnan anyway, why not just raise an exception. An
>array with NaNs in it can't be sorted by any common sense definition of
>sorting. Any treatment of NaNs is going to be arbitrary, so we might as
>well make the user specify what they want. "In the face of ambiguity,
>refuse the temptation to guess" and all that.
>My favorite solution would be to make sort respect the invalid mode of
>seterr/geterr. However at the moment it doesn't seem to (in beta4 at
>least) but neither does add or multiply so those probably need to be
>looked at again....
The geterr/seterr stuff changes how IEEE hardware flags are handled in
ufuncs. Currently they are not even looked at elsewhere. Are you
saying that add and multiply don't respect the invalid flag? If not,
then this might be hardware related. Does the IEEE invalid hardware
flag get raised on multiplication by nan or only on creation of nan?
All the seterr/geterr stuff relies on the hardware flags. We don't do
any other checking.
More information about the Numpy-discussion