[Numpy-discussion] searchsorted() and memory cache

Andrew Straw strawman@astraw....
Wed May 14 09:09:26 CDT 2008


> I will post any new insights as I continue to work on this...
>   
OK, I save isolated a sample of my data that illustrates the terrible
performance with the binarysearch. I have uploaded it as a pytables file
to http://astraw.com/framenumbers.h5 in case anyone wants to have a look
themselves. Here's an example of the type of benchmark I've been running:

import fastsearch.downsamp
import fastsearch.binarysearch
import tables

h5=tables.openFile('framenumbers.h5',mode='r')
framenumbers=h5.root.framenumbers.read()
keys=h5.root.keys.read()
h5.close()

def bench( implementation ):
    for key in keys:
        implementation.index( key )

downsamp = fastsearch.downsamp.DownSampledPreSearcher( framenumbers )
binary = fastsearch.binarysearch.BinarySearcher( framenumbers )

# The next two lines are IPython-specific, and the 2nd takes a looong time:

%timeit bench(downsamp)
%timeit bench(binary)



Running the above gives:

In [14]: %timeit bench(downsamp)
10 loops, best of 3: 64 ms per loop

In [15]: %timeit bench(binary)

10 loops, best of 3: 184 s per loop

Quite a difference (a factor of about 3000)! At this point, I haven't
delved into the dataset to see what makes it so pathological --
performance is nowhere near this bad for the binary search algorithm
with other sets of keys.



More information about the Numpy-discussion mailing list