[Numpy-discussion] Performance issues with numarray.searchsorted()
Todd Miller
jmiller at stsci.edu
Thu May 20 10:14:06 CDT 2004
On Thu, 2004-05-20 at 05:26, Robert Kern wrote:
> Francesc Alted wrote:
> > Hi,
> >
> > I'm willing to use a lot the searchsorted function in numarray, but I'm a
> > bit surprised about the poor performance of it. Look at that:
> >
> >
> >>>>from time import time
> >>>>import numarray
> >>>>import Numeric
> >>>>na=numarray.arange(1000*1000)
> >>>>nu=Numeric.arange(1000*1000)
> >>>>t1=time();numarray.searchsorted(na,200*1000);time()-t1
> >
> > 200000
> > 0.00055098533630371094
> >
> >>>>t1=time();Numeric.searchsorted(nu,200*1000);time()-t1
> >
> > 200000
> > 7.7962875366210938e-05
> >
> > It may seem that Numeric is better optimised, but the standard python module
> > bisect is even faster than numarray.searchsorted:
> >
> >
> >>>>import bisect
> >>>>t1=time();bisect.bisect_left(na,200*1000);time()-t1
> >
> > 200000
> > 8.8930130004882812e-05
> >
> >>>>t1=time();bisect.bisect_left(nu,200*1000);time()-t1
> >
> > 200000
> > 8.6069107055664062e-05
>
> A better timing (IMHO), but with similar conclusions:
>
> Python 2.3 (#1, Sep 13 2003, 00:49:11)
> [GCC 3.3 20030304 (Apple Computer, Inc. build 1495)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import timeit
> >>> t1 = timeit.Timer("Numeric.searchsorted(a,200000)",
> "import Numeric;a=Numeric.arange(1000000)")
> >>> t2 = timeit.Timer("numarray.searchsorted(a,200000)",
> "import numarray;a=numarray.arange(1000000)")
> >>> t3 = timeit.Timer("bisect.bisect_left(a,200000)",
> "import Numeric;import bisect;a=Numeric.arange(1000000)")
> >>> t4 = timeit.Timer("bisect.bisect_left(a,200000)",
> "import numarray;import bisect;a=numarray.arange(1000000)")
> >>> t1.repeat(3,10000)
> [0.15758609771728516, 0.17469501495361328, 0.15456986427307129]
> >>> t2.repeat(3,10000)
> [6.7581729888916016, 6.9644770622253418, 6.6776731014251709]
> >>> t3.repeat(3,10000)
> [0.41335701942443848, 0.45698308944702148, 0.39665889739990234]
> >>> t4.repeat(3,10000)
> [0.49930000305175781, 0.48063492774963379, 0.52067780494689941]
>
> [Apologies for the linewraps.]
>
> I also get similar results with double arrays. Weird.
>
> Python 2.3 on Mac OS X 10.3.sumthin', latest CVS checkout of numarray,
> Numeric 23.1.
Here's what I got with the numarray benchmarks after adding an extra
case:
benchmark i numarray (usec) Numeric (usec) numarray:Numeric
searchsorted(a,p/5) 0.0 446 12 36.7
1.0 438 11 36.8
2.0 450 12 35.0
3.0 459 14 32.6
4.0 511 25 19.7
5.0 636 56 11.2
6.0 653 64 10.1
searchsorted(a,a) 0.0 285 5 48.0
1.0 283 7 39.6
2.0 291 29 9.8
3.0 368 308 1.2
4.0 1771 4120 0.4
5.0 17335 52127 0.3
6.0 201277 605787 0.3
p = 10**i
a = arange(p)
The first case agrees with your results of ~10x difference in favor of
Numeric at 10**6 elements. The last case shows a ~3x numarray advantage
given large numbers of values.
My analysis is this: since searchsorted runs in O(log2(N)) time, even
with 10**6 elements there are only 20 iterations or so. This is just
not enough to overcome numarray's Python ufunc overhead. I'll see what
I can do, but I think we're up against the standard numarray
performance wall for small arrays.
Regards,
Todd
More information about the Numpy-discussion
mailing list