[Numpy-discussion] Performance issues with numarray.searchsorted()

Todd Miller jmiller at stsci.edu
Thu May 20 10:14:06 CDT 2004


On Thu, 2004-05-20 at 05:26, Robert Kern wrote:
> Francesc Alted wrote:
> > Hi,
> > 
> > I'm willing to use a lot the searchsorted function in numarray, but I'm a
> > bit surprised about the poor performance of it. Look at that:
> > 
> > 
> >>>>from time import time
> >>>>import numarray
> >>>>import Numeric
> >>>>na=numarray.arange(1000*1000)
> >>>>nu=Numeric.arange(1000*1000)
> >>>>t1=time();numarray.searchsorted(na,200*1000);time()-t1
> > 
> > 200000
> > 0.00055098533630371094
> > 
> >>>>t1=time();Numeric.searchsorted(nu,200*1000);time()-t1
> > 
> > 200000
> > 7.7962875366210938e-05
> > 
> > It may seem that Numeric is better optimised, but the standard python module
> > bisect is even faster than numarray.searchsorted:
> > 
> > 
> >>>>import bisect
> >>>>t1=time();bisect.bisect_left(na,200*1000);time()-t1
> > 
> > 200000
> > 8.8930130004882812e-05
> > 
> >>>>t1=time();bisect.bisect_left(nu,200*1000);time()-t1
> > 
> > 200000
> > 8.6069107055664062e-05
> 
> A better timing (IMHO), but with similar conclusions:
> 
> Python 2.3 (#1, Sep 13 2003, 00:49:11)
> [GCC 3.3 20030304 (Apple Computer, Inc. build 1495)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
>  >>> import timeit
>  >>> t1 = timeit.Timer("Numeric.searchsorted(a,200000)",
> "import Numeric;a=Numeric.arange(1000000)")
>  >>> t2 = timeit.Timer("numarray.searchsorted(a,200000)",
> "import numarray;a=numarray.arange(1000000)")
>  >>> t3 = timeit.Timer("bisect.bisect_left(a,200000)",
> "import Numeric;import bisect;a=Numeric.arange(1000000)")
>  >>> t4 = timeit.Timer("bisect.bisect_left(a,200000)",
> "import numarray;import bisect;a=numarray.arange(1000000)")
>  >>> t1.repeat(3,10000)
> [0.15758609771728516, 0.17469501495361328, 0.15456986427307129]
>  >>> t2.repeat(3,10000)
> [6.7581729888916016, 6.9644770622253418, 6.6776731014251709]
>  >>> t3.repeat(3,10000)
> [0.41335701942443848, 0.45698308944702148, 0.39665889739990234]
>  >>> t4.repeat(3,10000)
> [0.49930000305175781, 0.48063492774963379, 0.52067780494689941]
> 
> [Apologies for the linewraps.]
> 
> I also get similar results with double arrays. Weird.
> 
> Python 2.3 on Mac OS X 10.3.sumthin', latest CVS checkout of numarray, 
> Numeric 23.1.
Here's what I got with the numarray benchmarks after adding an extra
case:

benchmark                    i    numarray (usec)    Numeric (usec)  numarray:Numeric

searchsorted(a,p/5)        0.0               446                12              36.7
                           1.0               438                11              36.8
                           2.0               450                12              35.0
                           3.0               459                14              32.6
                           4.0               511                25              19.7
                           5.0               636                56              11.2
                           6.0               653                64              10.1

searchsorted(a,a)          0.0               285                 5              48.0
                           1.0               283                 7              39.6
                           2.0               291                29               9.8
                           3.0               368               308               1.2
                           4.0              1771              4120               0.4
                           5.0             17335             52127               0.3
                           6.0            201277            605787               0.3

p = 10**i
a = arange(p)

The first case agrees with your results of ~10x difference in favor of
Numeric at 10**6 elements.  The last case shows a ~3x numarray advantage
given large numbers of values.

My analysis is this:  since searchsorted runs in O(log2(N)) time,  even
with 10**6 elements there are only 20 iterations or so.  This is just
not enough to overcome numarray's Python ufunc overhead.  I'll see what
I can do,  but I think we're up against the standard numarray
performance wall for small arrays.	

Regards,
Todd







More information about the Numpy-discussion mailing list