[Numpy-discussion] searchsorted() and memory cache
Charles R Harris
Thu May 8 23:30:26 CDT 2008
On Thu, May 8, 2008 at 9:55 PM, Robert Kern <email@example.com> wrote:
> On Thu, May 8, 2008 at 10:51 PM, Andrew Straw <firstname.lastname@example.org> wrote:
> > I've got a big element array (25 million int64s) that searchsorted()
> > takes a long time to grind through. After a bit of digging in the
> > literature and the numpy source code, I believe that searchsorted() is
> > implementing a classic binary search,
> > which is pretty bad in terms of
> > cache misses. There are several modern implementations of binary search
> > which arrange items in memory such that cache misses are much more rare.
> > Clearly making such an indexing arrangement would take time, but in my
> > particular case, I can spare the time to create an index if searching
> > was faster, since I'd make the index once but do the searching many
> > Is there an implementation of such an algorithm that works easilty with
> > numpy? Also, can you offer any advice, suggestions, and comments to me
> > if I attempted to implement such an algorithm?
> I'm no help. You seem to know more than I do. Sadly, the first few
> Google hits I get for "binary search minimize cache misses" are
> patents. I don't know what the substantive content of those patents
> are; I have a strict policy of not reading patents.
I would be interested in adding such a thing if it wasn't patent encumbered.
A good start would be a prototype in python to show how it all went together
and whether it needed a separate indexing/lookup function or could be fit
into the current setup.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Numpy-discussion