[Numpy-discussion] searchsorted() and memory cache
Thu May 8 22:55:56 CDT 2008
On Thu, May 8, 2008 at 10:51 PM, Andrew Straw <firstname.lastname@example.org> wrote:
> I've got a big element array (25 million int64s) that searchsorted()
> takes a long time to grind through. After a bit of digging in the
> literature and the numpy source code, I believe that searchsorted() is
> implementing a classic binary search,
> which is pretty bad in terms of
> cache misses. There are several modern implementations of binary search
> which arrange items in memory such that cache misses are much more rare.
> Clearly making such an indexing arrangement would take time, but in my
> particular case, I can spare the time to create an index if searching
> was faster, since I'd make the index once but do the searching many times.
> Is there an implementation of such an algorithm that works easilty with
> numpy? Also, can you offer any advice, suggestions, and comments to me
> if I attempted to implement such an algorithm?
I'm no help. You seem to know more than I do. Sadly, the first few
Google hits I get for "binary search minimize cache misses" are
patents. I don't know what the substantive content of those patents
are; I have a strict policy of not reading patents.
"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
-- Umberto Eco
More information about the Numpy-discussion