[Numpy-discussion] speed problem with Numeric sort

Tim Hochberg tim.hochberg at ieee.org
Tue Jun 19 10:54:21 CDT 2001


Hi Berthold,

I tested your code on Win95 using Numeric 20.0.0 and got essentially the
same pattern of times. So, upgrading your version of Numeric is not likely
to help. Curious, I checked out the code for sort and found that it justs
calls qsort from the C library. I suspect that the problem is related to
quicksort having bad worst case behaviour. Not being a computer scientest, I
can't tell you under what situations the bad behaviour is triggered,
although I know it doesn't like presorted lists.

Anyway, if you're using 1D arrays, one workaround would be to use list.sort.
Python's sorting routines has, I understand, lots of gimmicks to avoid the
problems that quicksort sometimes encounters. I tried it and this:

    r=(random((71400,))*7).astype(Int)
    l = r.tolist()
    l.sort()
    p = array(l)

runs about 40 times faster than this:

   r=(random((71400,))*7).astype(Int)
   p = r.sort()

If you don't need to convert to and from a array, this approach is 60 times
faster. Even if you're dealing with a multidimensional array, this approach
(in a loop) might be signifigantly faster assuming you're sorting along the
long axis.

It makes one wonder if using the python sort rather than qsort for
Numeric.sort would be a profitable change. No time to investigate it right
now though.

Hope that's useful...

-tim





> We have a speed problem with Numeric.sort on large arrays with only a
> few different values. Here is my example
>
> -- snip --
>
> >cat numtst.py
> import Numeric
> print Numeric.__version__
> class timer:
>     def __init__(self):
>         import time
>         self.start = time.time()
>     def stop(self):
>         import time
>         print "%.3f" % (time.time() - self.start)
>
> from RandomArray import random
> from Numeric import sort, Int
>
> r=random((71400,))
> t = timer() ; p=sort(r) ; t.stop()
> r=(random((71400,))*70000).astype(Int)
> t = timer() ; p=sort(r) ; t.stop()
> r=(random((71400,))*70).astype(Int)
> t = timer() ; p=sort(r) ; t.stop()
> r=(random((71400,))*7).astype(Int)
> t = timer() ; p=sort(r) ; t.stop()
> 16:27 hoel at seeve:hoel 2>python numtst.py
> 17.3.0
> 0.185
> 0.148
> 2.053
> 21.668
>
> -- snip --
>
> So the less different values are contained in the array the longer
> takes the sorting. Is this also the case with newer versions of
> Numeric (But this is Python 1.5.2)? Why is sorting of these arrays so
> slow?
>
> Thanks
>
> Berthold
> --
>        email: hoel at GermanLloyd.org
>    )   tel. : +49 (40) 3 61 49 - 73 74
>   (
> C[_]  These opinions might be mine, but never those of my employer.
>
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at lists.sourceforge.net
> http://lists.sourceforge.net/lists/listinfo/numpy-discussion





More information about the Numpy-discussion mailing list