[Numpy-discussion] searchsorted() and memory cache
Nathan Bell
wnbell@gmail....
Tue May 13 20:44:01 CDT 2008
On Tue, May 13, 2008 at 6:59 PM, Andrew Straw <strawman@astraw.com> wrote:
> easier and still blazingly fast compared to the binary search
> implemented in searchsorted() given today's cached memory architectures.
Andrew, I looked at your code and I don't quite understand something.
Why are you looking up single values?
Is it not the case that the overhead of several Python calls
completely dominates the actual cost of performing a binary search
(e.g. in C code)? For instance, the 'v' argument of searchsorted(a,v)
can be an array:
from scipy import *
haystack = rand(1e7)
needles = haystack[:10000].copy()
haystack.sort()
timeit searchsorted(haystack, needles)
100 loops, best of 3: 12.7 ms per loop
Which seems to be much faster than:
timeit for k in needles: x = searchsorted(haystack,k)
10 loops, best of 3: 43.8 ms per loop
The other thing to consider is that a reasonably smart CPU cache
manager may retain the first few levels that are explored in a binary
search tree. Clearly you could speed this up by increasing the
branching factor of the tree, or using a fast index into the large
array. However, I think that these effects are being masked by Python
overhead in your tests.
I whipped up a weave implementation of searchsorted() that uses the
STL. It clocks in at 9.72ms per loop, so I think NumPy's
searchsorted() is fairly good.
import scipy
import scipy.weave
def searchsorted2(a,v):
N_a = len(a)
N_v = len(v)
indices = scipy.empty(N_v, dtype='intc')
code = """
for(int i = 0; i < N_v; i++){
indices(i) = lower_bound(&a(0), &a(N_a), v(i)) - &a(0);
}
"""
err = scipy.weave.inline(code,
['a','v','N_a', 'N_v','indices'],
type_converters = scipy.weave.converters.blitz,
compiler = 'gcc',
support_code = '#include <algo.h>')
return indices
--
Nathan Bell wnbell@gmail.com
http://graphics.cs.uiuc.edu/~wnbell/
More information about the Numpy-discussion
mailing list