[Numpy-discussion] sorting -inf, nan, inf

Christopher Barker Chris.Barker at noaa.gov
Wed Sep 20 17:18:27 CDT 2006


Charles R Harris wrote:
> Thinking a bit, keeping the values in place isn't easy.

Why the heck would "in place" be desirable for sorted data anyway? I 
understand that it means that if there is a NaN in the nth position 
before sorting, there will be one in the nth position after sorting. 
However, I see absolutely no reason at all why that would be useful (or 
any more useful than putting them anywhere else)

A couple years ago, there was a long debate on this list about whether 
numpy should pass -inf, NaN, and +inf through all the ufuncs without 
error. there were two schools of thought:

1) They indicate a problem, the programmer should know about hat problem 
as soon as it occurs, not at the end of the computation, many steps 
later, when they might get presented with nothing but NaNs.

2) The whole point of "vector" computation is that you can act on a 
whole bunch of numbers at once. If only subset of those numbers are 
invalid, why stop the process. Raising an error when a single number has 
a problem defeats the purpose of vector operations.

It seems that numpy has settled on school of thought (2), at least by 
default. That being the case, it should apply to sorting also. If it 
does, then that means no exception will be raised, but it makes no 
difference where the heck the NaNs end up in the sorted array, as long 
as everything else is in order. NaN means exactly what it's called: it's 
not a number, so it doesn't matter what you do with them, as long as 
they are preserved and don't mess up other things. Let the coder decide 
what they want to so with them, and when they want to do it. Personally, 
I'd prefer that they all ended up at the beginning or end after sorting, 
but it really doesn't much matter.

That being said, if it's impossible to do a efficient sort with NaNs 
mixed in, then we'll just have to live with it. It really would be best 
if an exception was raised if the non-NaN values are not going to be 
sorted correctly -- that really would surprise people!

 > It would probably also not be unreasonable to punt and document sort
 > as failing in the presence of nans.

That would be one of the worst options, but may be the only one available.


-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer
                                     		
NOAA/OR&R/HAZMAT         (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov




More information about the Numpy-discussion mailing list