[Numpy-discussion] searchsorted() and memory cache

Andrew Straw strawman@astraw....
Thu May 8 22:51:21 CDT 2008


I've got a big element array (25 million int64s) that searchsorted()
takes a long time to grind through. After a bit of digging in the
literature and the numpy source code, I believe that searchsorted() is
implementing a classic binary search, which is pretty bad in terms of
cache misses. There are several modern implementations of binary search
which arrange items in memory such that cache misses are much more rare.
Clearly making such an indexing arrangement would take time, but in my
particular case, I can spare the time to create an index if searching
was faster, since I'd make the index once but do the searching many times.

Is there an implementation of such an algorithm that works easilty with
numpy? Also, can you offer any advice, suggestions, and comments to me
if I attempted to implement such an algorithm?

Thanks,
Andrew


More information about the Numpy-discussion mailing list