[Numpy-discussion] sorting -inf, nan, inf
tim.hochberg at ieee.org
Tue Sep 19 17:12:08 CDT 2006
Travis Oliphant wrote:
> Tim Hochberg wrote:
>> A. M. Archibald wrote:
>>> On 19/09/06, Tim Hochberg <tim.hochberg at ieee.org> wrote:
>>>> I'm not sure where the breakpoint is, but I was seeing failures for all
>>>> three sort types with N as high as 10000. I suspect that they're all
>>>> broken in the presence of NaNs. I further suspect you'd need some
>>>> punishingly slow n**2 algorithm to be robust in the presence of NaNs.
>>> Not at all. Just temporarily make NaNs compare greater than any other
>>> floating-point value and use quicksort (say). You can even do this for
>>> python lists without much trouble.
>> I misspoke. What I meant here was keeping the behavior that people think
>> that we already have but don't: NaNs stay in place and everything is
>> sorted around them. And even that's not true, since you could just
>> record where the NaNs are, remove them, sort and put them back. What I
>> was really getting at was, that I'm guessing, and it's just a guess,
>> that (a) none of the fast sorting algorithms do anything sensible unless
>> special cased and (b) one could come up with a naive n**2 sort that does
>> do something sensible without special casing (where sensible means leave
>> the NaNs alone).
>>> That's actually a viable suggestion for numpy's sorting, although it
>>> would be kind of ugly to implement: do a quick any(isnan(A)), and if
>>> not, use the fast stock sorting algorithms; if there is a NaN
>>> somewhere in the array, use a version of the sort that has a tweaked
>>> comparison function so the NaNs wind up at the end and are easy to
>>> trim off.
>>> But the current situation, silently returning arrays in which the
>>> non-NaNs are unsorted, is really bad.
>> If your going to do isnan anyway, why not just raise an exception. An
>> array with NaNs in it can't be sorted by any common sense definition of
>> sorting. Any treatment of NaNs is going to be arbitrary, so we might as
>> well make the user specify what they want. "In the face of ambiguity,
>> refuse the temptation to guess" and all that.
>> My favorite solution would be to make sort respect the invalid mode of
>> seterr/geterr. However at the moment it doesn't seem to (in beta4 at
>> least) but neither does add or multiply so those probably need to be
>> looked at again....
> The geterr/seterr stuff changes how IEEE hardware flags are handled in
> ufuncs. Currently they are not even looked at elsewhere. Are you
> saying that add and multiply don't respect the invalid flag? If not,
> then this might be hardware related. Does the IEEE invalid hardware
> flag get raised on multiplication by nan or only on creation of nan?
It looks like I jumped to conclusions here. I was expecting that with
invalid set to 'raise' that an array that (+-*operations on nans would
raise an exception. It appears that is incorrect -- only operations that
create nans from non-nans will trigger this as you suspected. Similarly,
I expected that over='raise' would cause inf+something to raise an
exception. Again, not true; this raises an exception only when a new inf
is created from non infs. At least on my box.
Interesting. And a little surprising.
An interesting tidbit (in an array) inf/inf will raise invalid, but
nan/nan will not, it just returns nan. Fun, fun fun!
> All the seterr/geterr stuff relies on the hardware flags. We don't do
> any other checking.
Yeah. And just by itself this isn't going to do anything for sort since
comparing nans will not set any flags, it's just that the result will be
problematic.If one wanted to use these flags for this one would have to
use/abuse the result of geterr to trigger an isnan test at the beginning
of sort and then warn, raise or call as appropriate.
It would probably also not be unreasonable to punt and document sort as
failing in the presence of nans.
More information about the Numpy-discussion