[Numpy-discussion] speed problem with Numeric sort
Tim Hochberg
tim.hochberg at ieee.org
Tue Jun 19 10:54:21 CDT 2001
Hi Berthold,
I tested your code on Win95 using Numeric 20.0.0 and got essentially the
same pattern of times. So, upgrading your version of Numeric is not likely
to help. Curious, I checked out the code for sort and found that it justs
calls qsort from the C library. I suspect that the problem is related to
quicksort having bad worst case behaviour. Not being a computer scientest, I
can't tell you under what situations the bad behaviour is triggered,
although I know it doesn't like presorted lists.
Anyway, if you're using 1D arrays, one workaround would be to use list.sort.
Python's sorting routines has, I understand, lots of gimmicks to avoid the
problems that quicksort sometimes encounters. I tried it and this:
r=(random((71400,))*7).astype(Int)
l = r.tolist()
l.sort()
p = array(l)
runs about 40 times faster than this:
r=(random((71400,))*7).astype(Int)
p = r.sort()
If you don't need to convert to and from a array, this approach is 60 times
faster. Even if you're dealing with a multidimensional array, this approach
(in a loop) might be signifigantly faster assuming you're sorting along the
long axis.
It makes one wonder if using the python sort rather than qsort for
Numeric.sort would be a profitable change. No time to investigate it right
now though.
Hope that's useful...
-tim
> We have a speed problem with Numeric.sort on large arrays with only a
> few different values. Here is my example
>
> -- snip --
>
> >cat numtst.py
> import Numeric
> print Numeric.__version__
> class timer:
> def __init__(self):
> import time
> self.start = time.time()
> def stop(self):
> import time
> print "%.3f" % (time.time() - self.start)
>
> from RandomArray import random
> from Numeric import sort, Int
>
> r=random((71400,))
> t = timer() ; p=sort(r) ; t.stop()
> r=(random((71400,))*70000).astype(Int)
> t = timer() ; p=sort(r) ; t.stop()
> r=(random((71400,))*70).astype(Int)
> t = timer() ; p=sort(r) ; t.stop()
> r=(random((71400,))*7).astype(Int)
> t = timer() ; p=sort(r) ; t.stop()
> 16:27 hoel at seeve:hoel 2>python numtst.py
> 17.3.0
> 0.185
> 0.148
> 2.053
> 21.668
>
> -- snip --
>
> So the less different values are contained in the array the longer
> takes the sorting. Is this also the case with newer versions of
> Numeric (But this is Python 1.5.2)? Why is sorting of these arrays so
> slow?
>
> Thanks
>
> Berthold
> --
> email: hoel at GermanLloyd.org
> ) tel. : +49 (40) 3 61 49 - 73 74
> (
> C[_] These opinions might be mine, but never those of my employer.
>
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at lists.sourceforge.net
> http://lists.sourceforge.net/lists/listinfo/numpy-discussion
More information about the Numpy-discussion
mailing list