[Numpy-discussion] sorting -inf, nan, inf

Charles R Harris charlesr.harris at gmail.com
Tue Sep 19 15:56:00 CDT 2006

On 9/19/06, Tim Hochberg <tim.hochberg at ieee.org> wrote:
> A. M. Archibald wrote:
> > On 19/09/06, Tim Hochberg <tim.hochberg at ieee.org> wrote:
> >
> >> Keith Goodman wrote:
> >>
> >>> In what order would you like argsort to sort the values -inf, nan,
> inf?
> >>>
> >>>
> >> Ideally, -inf should sort first, inf should sort last and nan should
> >> raise an exception if present.
> >>
> >> -tim
> >>
> >
> > Mmm. Somebody who's working with NaNs has more or less already decided
> > they don't want to be pestered with exceptions for invalid data.
> Do you really think so? In my experience NaNs are nearly always just an
> indication of a mistake somewhere that didn't get trapped for one reason
> or another.
> >  I'd
> > be happy if they wound up at either end, but I'm not sure it's worth
> > hacking up the sort algorithm when a simple isnan() can pull them out.
> >
> Moving them to the end seems to be the worst choice to me. Leaving them
> alone is fine with me. Or raising an exception would be fine. Or doing
> one or the other depending on the error mode settings would be even
> better if it is practical.
> > What's happening now is that NaN<a, NaN==a, and NaN>a are all false,
> > which rather confuses the sorting algorithm. But as long as it doesn't
> > actually *break* (that is, leave some of the non-NaNs incorrectly
> > sorted) I don't care.
> >
> Is that true? Are all of numpy's sorting algorithms robust against
> nontransitive objects laying around? The answer to that appears to be
> no. Try running this a couple of times to see what I mean:
> n = 10
> for i in range(10):
>     for kind in 'quicksort', 'heapsort', 'mergesort':
>              a = rand(n)
>              b = a.copy()
>              a[n//2] = nan
>              a.sort(kind=kind)
>              b.sort(kind=kind)
>              assert alltrue(a[:n//2] == b[:n//2]), (kind, a, b)
> The values don't correctly cross the inserted NaN and the sort is
> incorrect.

Except for heapsort those are doing insertion sorts because n is so small.
Heapsort is trickier to understand because of the way the heap is formed and
values pulled off. I'll look into that. I bet searchsorted gets messed up by
nan's. Do you think it worthwhile to worry about it?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://projects.scipy.org/pipermail/numpy-discussion/attachments/20060919/f9f14911/attachment.html 

More information about the Numpy-discussion mailing list