[Numpy-discussion] Radix sort?

Jon Wright wright@esrf...
Wed Jun 20 04:58:02 CDT 2007


 > "Charles R Harris" <charlesr.harris@gmail.com> wrote:
 >
> Straight radix sort might be an interesting option for some things.
> However, its performance can depend on whether the input data is random or
> not and it takes up more space than merge sort. Other potential drawbacks
> arise from the bit twiddling needed for signed numbers and floats, the
> former solved by converting to offset binary numbers (complement the sign
> bit), and the latter in the way your links indicate, but both leading to a
> proliferation of special cases. Maintaining byte order and byte addressing
> portability between cpu architectures might also require masking and
> shifting that will add computational expense and may lead to more special
> cases for extended precision floats and so on. That said, I would be curious
> to see how it works out if you want to give it a try.

I agree completely about the proliferation of special cases, and mess 
that will make. This radix sort is bad for tiny arrays, but good for big 
random ones (there is no insertion sort either?). An intelligent 
algorithm chooser is needed, something like an "atlas"/"fftw" but for 
sorting, which has been invented already it seems. Platform and datatype 
and even the data themselves seem to be important. eg:

http://www.spiral.net/software/sorting.html
http://www.cgo.org/cgo2004/papers/09_cgo04.pdf

Seems like a significant amount of work - and for the numpy case it 
should work with strided/sliced arrays without copying. Is there a list 
somewhere of the numpy numeric types, together with their binary 
representations on all of the numpy supported platforms? I'm guessing 
integers are almost always:
[signed/unsigned] [8|16|32|64 bits] [big|little endian]
... and that many popular platforms only use ieee745 floats and doubles, 
which might be big or little endian. Is there an important case I miss, 
such as decimal hardware?

The experimental observation is that the code from Michael Herf appears 
to win for float32 for random arrays >1024 elements or sorted arrays >2M 
elements, compared any of the 3 algorithms already in numpy (ymmv). The 
methods could also have a positive impact on the numpy.histogram 
function for the smaller data types, and also lead to other order 
statistics, like median, trimeans and n-th largest with a performance 
improvement.

Since sorting is one of the most studied problems in all of computer 
science, I am reluctant to start writing a new library! Does anyone know 
of some good libraries we could try out?

Best,

Jon



More information about the Numpy-discussion mailing list