[Numpy-discussion] sorting -inf, nan, inf

Charles R Harris charlesr.harris at gmail.com
Tue Sep 19 17:55:57 CDT 2006


On 9/19/06, Tim Hochberg <tim.hochberg at ieee.org> wrote:
>
> Travis Oliphant wrote:
> > Tim Hochberg wrote:
> >
> >
> >> A. M. Archibald wrote:
> >>
> >>
> >>
> >>> On 19/09/06, Tim Hochberg <tim.hochberg at ieee.org> wrote:
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>> I'm not sure where the breakpoint is, but I was seeing failures for
> all
> >>>> three sort types with N as high as 10000. I suspect that they're all
> >>>> broken in the presence of  NaNs.  I further suspect you'd need some
> >>>> punishingly slow n**2 algorithm to be robust in the presence of NaNs.
> >>>>
> >>>>
> >>>>
> >>>>
> >>> Not at all. Just temporarily make NaNs compare greater than any other
> >>> floating-point value and use quicksort (say). You can even do this for
> >>> python lists without much trouble.
> >>>
> >>>
> >>>
> >>>
> >> I misspoke. What I meant here was keeping the behavior that people
> think
> >> that we already have but don't: NaNs stay in place and everything is
> >> sorted around them. And even that's not true, since you could just
> >> record where the NaNs are, remove them, sort and put them back. What I
> >> was really getting at was, that I'm guessing, and it's just a guess,
> >> that (a) none of the fast sorting algorithms do anything sensible
> unless
> >> special cased and (b) one could come up with a naive n**2 sort that
> does
> >> do something sensible without special casing (where sensible means
> leave
> >> the NaNs alone).
> >>
> >>
> >>
> >>> That's actually a viable suggestion for numpy's sorting, although it
> >>> would be kind of ugly to implement: do a quick any(isnan(A)), and if
> >>> not, use the fast stock sorting algorithms; if there is a NaN
> >>> somewhere in the array, use a version of the sort that has a tweaked
> >>> comparison function so the NaNs wind up at the end and are easy to
> >>> trim off.
> >>>
> >>> But the current situation, silently returning arrays in which the
> >>> non-NaNs are unsorted, is really bad.
> >>>
> >>>
> >>>
> >>>
> >> If your going to do isnan anyway, why not just raise an exception. An
> >> array with NaNs in it can't be sorted by any common sense definition of
> >> sorting. Any treatment of NaNs is going to be arbitrary, so we might as
> >> well make the user specify what they want. "In the face of ambiguity,
> >> refuse the temptation to guess" and all that.
> >>
> >> My favorite solution would be to make sort respect the invalid mode of
> >> seterr/geterr. However at the moment it doesn't seem to (in beta4 at
> >> least) but neither does add or multiply so those probably need to be
> >> looked at again....
> >>
> >>
> >>
> > The geterr/seterr stuff changes how IEEE hardware flags are handled in
> > ufuncs.  Currently they are not even looked at elsewhere.     Are you
> > saying that add and multiply don't respect the invalid flag?   If not,
> > then this might be hardware related.  Does the IEEE invalid hardware
> > flag get raised on multiplication by nan or only on creation of nan?
> It looks like I jumped to conclusions here. I was expecting that with
> invalid set to 'raise' that an array that (+-*operations on nans would
> raise an exception. It appears that is incorrect -- only operations that
> create nans from non-nans will trigger this as you suspected. Similarly,
> I expected that over='raise' would cause inf+something to raise an
> exception. Again, not true; this raises an exception only when a new inf
> is created from non infs. At least on my box.
>
> Interesting. And a little surprising.
>
> An interesting tidbit (in an array) inf/inf will raise invalid, but
> nan/nan will not, it just returns nan. Fun, fun fun!
> >
> > All the seterr/geterr stuff relies on the hardware flags.   We don't do
> > any other checking.
> >
>
> Yeah. And just by itself this isn't going to do anything for sort since
> comparing nans will not set any flags, it's just that the result will be
> problematic.If one wanted to use these flags for this one would have to
> use/abuse the result of geterr to trigger an isnan test at the beginning
> of sort and then warn, raise or call as appropriate.
>
> It would probably also not be unreasonable to punt and document sort as
> failing in the presence of nans.


For floats we could use something like:

lessthan(a,b) := a < b || (a == nan && b != nan)

Which would put all the nans at one end and might not add too much overhead.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://projects.scipy.org/pipermail/numpy-discussion/attachments/20060919/6b3d0ed7/attachment.html 


More information about the Numpy-discussion mailing list