[Numpy-discussion] searchsorted() and memory cache
Charles R Harris
Fri May 9 00:06:16 CDT 2008
On Thu, May 8, 2008 at 10:30 PM, Charles R Harris <firstname.lastname@example.org>
> On Thu, May 8, 2008 at 9:55 PM, Robert Kern <email@example.com> wrote:
>> On Thu, May 8, 2008 at 10:51 PM, Andrew Straw <firstname.lastname@example.org>
>> > I've got a big element array (25 million int64s) that searchsorted()
>> > takes a long time to grind through. After a bit of digging in the
>> > literature and the numpy source code, I believe that searchsorted() is
>> > implementing a classic binary search,
>> > which is pretty bad in terms of
>> > cache misses. There are several modern implementations of binary search
>> > which arrange items in memory such that cache misses are much more rare.
>> > Clearly making such an indexing arrangement would take time, but in my
>> > particular case, I can spare the time to create an index if searching
>> > was faster, since I'd make the index once but do the searching many
>> > Is there an implementation of such an algorithm that works easilty with
>> > numpy? Also, can you offer any advice, suggestions, and comments to me
>> > if I attempted to implement such an algorithm?
>> I'm no help. You seem to know more than I do. Sadly, the first few
>> Google hits I get for "binary search minimize cache misses" are
>> patents. I don't know what the substantive content of those patents
>> are; I have a strict policy of not reading patents.
> I would be interested in adding such a thing if it wasn't patent
> encumbered. A good start would be a prototype in python to show how it all
> went together and whether it needed a separate indexing/lookup function or
> could be fit into the current setup.
One way I can think of doing this is to have two indices. One is the usual
sorted list, the second consists of, say, every 1024'th entry in the first
list. Then search the second list first to find the part of the first list
to search. That won't get you into the very best cache, but it could buy you
a factor of 2x-4x in speed. It's sort of splitting the binary tree into two
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Numpy-discussion