[Numpy-discussion] sorting -inf, nan, inf
A. M. Archibald
peridot.faceted at gmail.com
Tue Sep 19 16:09:14 CDT 2006
On 19/09/06, Tim Hochberg <tim.hochberg at ieee.org> wrote:
> A. M. Archibald wrote:
> > Mmm. Somebody who's working with NaNs has more or less already decided
> > they don't want to be pestered with exceptions for invalid data.
> Do you really think so? In my experience NaNs are nearly always just an
> indication of a mistake somewhere that didn't get trapped for one reason
> or another.
Well, I said that because for an image porcessing project I was doing,
the easiest thing to do with certain troublesome pixels was to fill in
NaNs, and then at the end replace the NaNs with sensible values. It
seems as if the point of NaNs is to allow you to keep working with
those numbers that make sense while ingoring those that don't. If you
wanted exceptions, why not get them as soon as the first NaN would
have been generated?
> > I'd
> > be happy if they wound up at either end, but I'm not sure it's worth
> > hacking up the sort algorithm when a simple isnan() can pull them out.
> Moving them to the end seems to be the worst choice to me. Leaving them
> alone is fine with me. Or raising an exception would be fine. Or doing
> one or the other depending on the error mode settings would be even
> better if it is practical.
I was just thinking in terms of easy removal.
> Is that true? Are all of numpy's sorting algorithms robust against
> nontransitive objects laying around? The answer to that appears to be
> no. Try running this a couple of times to see what I mean:
> The values don't correctly cross the inserted NaN and the sort is incorrect.
You're quite right: when NaNs are present in the array, sorting and
then removing them does not yield a sorted array. For example,
mergesort just output
[ 2. 4. 6. 9. nan 0. 1.
3. 5. 7. 8. ]
The other two are no better (and arguably worse). Python's built-in
sort() for lists has the same problem.
This is definitely a bug, and the best way to fix it is not clear to
me - perhaps sort() needs to always do any(isnan(A)) before starting
to sort. I don't really like raising an exception, but sort() isn't
really very meaningful with NaNs in the array. The only other option I
can think of is to somehow remove them, sort without, and reintroduce
them at the end, which is going to be a nightmare when sorting a
single axis of a large array. Or, I suppose, sort() could simply fill
the array with NaNs; I'm sure users will love that.
A. M. Archibald
More information about the Numpy-discussion