[Numpy-discussion] searchsorted() and memory cache

Nathan Bell wnbell@gmail....
Tue May 13 20:44:01 CDT 2008

```On Tue, May 13, 2008 at 6:59 PM, Andrew Straw <strawman@astraw.com> wrote:
>  easier and still blazingly fast compared to the binary search
>  implemented in searchsorted() given today's cached memory architectures.

Andrew, I looked at your code and I don't quite understand something.
Why are you looking up single values?

Is it not the case that the overhead of several Python calls
completely dominates the actual cost of performing a binary search
(e.g. in C code)?  For instance, the 'v' argument of searchsorted(a,v)
can be an array:

from scipy import *
haystack = rand(1e7)
needles = haystack[:10000].copy()
haystack.sort()
timeit searchsorted(haystack, needles)

100 loops, best of 3: 12.7 ms per loop

Which seems to be much faster than:

timeit for k in needles: x = searchsorted(haystack,k)

10 loops, best of 3: 43.8 ms per loop

The other thing to consider is that a reasonably smart CPU cache
manager may retain the first few levels that are explored in a binary
search tree.  Clearly you could speed this up by increasing the
branching factor of the tree, or using a fast index into the large
array.  However, I think that these effects are being masked by Python

I whipped up a weave implementation of searchsorted() that uses the
STL.  It clocks in at 9.72ms per loop, so I think NumPy's
searchsorted() is fairly good.

import scipy
import scipy.weave

def searchsorted2(a,v):
N_a = len(a)
N_v = len(v)
indices = scipy.empty(N_v, dtype='intc')

code = """
for(int i = 0; i < N_v; i++){
indices(i) = lower_bound(&a(0), &a(N_a), v(i)) - &a(0);
}
"""

err = scipy.weave.inline(code,
['a','v','N_a', 'N_v','indices'],
type_converters = scipy.weave.converters.blitz,
compiler = 'gcc',
support_code = '#include <algo.h>')

return indices

--
Nathan Bell wnbell@gmail.com
http://graphics.cs.uiuc.edu/~wnbell/
```