[Numpy-discussion] searchsorted() and memory cache

Charles R Harris charlesr.harris@gmail....
Thu May 8 23:30:26 CDT 2008


On Thu, May 8, 2008 at 9:55 PM, Robert Kern <robert.kern@gmail.com> wrote:

> On Thu, May 8, 2008 at 10:51 PM, Andrew Straw <strawman@astraw.com> wrote:
> > I've got a big element array (25 million int64s) that searchsorted()
> > takes a long time to grind through. After a bit of digging in the
> > literature and the numpy source code, I believe that searchsorted() is
> > implementing a classic binary search,
>
> Yes.
>
> > which is pretty bad in terms of
> > cache misses. There are several modern implementations of binary search
> > which arrange items in memory such that cache misses are much more rare.
> > Clearly making such an indexing arrangement would take time, but in my
> > particular case, I can spare the time to create an index if searching
> > was faster, since I'd make the index once but do the searching many
> times.
> >
> > Is there an implementation of such an algorithm that works easilty with
> > numpy? Also, can you offer any advice, suggestions, and comments to me
> > if I attempted to implement such an algorithm?
>
> I'm no help. You seem to know more than I do. Sadly, the first few
> Google hits I get for "binary search minimize cache misses" are
> patents. I don't know what the substantive content of those patents
> are; I have a strict policy of not reading patents.
>

I would be interested in adding such a thing if it wasn't patent encumbered.
A good start would be a prototype in python to show how it all went together
and whether it needed a separate indexing/lookup function or could be fit
into the current setup.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://projects.scipy.org/pipermail/numpy-discussion/attachments/20080508/1bba2656/attachment-0001.html 


More information about the Numpy-discussion mailing list