[Numpy-discussion] searchsorted() and memory cache
Charles R Harris
Tue May 13 19:38:37 CDT 2008
On Tue, May 13, 2008 at 5:59 PM, Andrew Straw <firstname.lastname@example.org> wrote:
> Thanks for all the comments on my original question. I was more offline
> than intended after I sent it until now, so I'm sorry I wasn't
> immediately able to participate in the discussion.
> Anyhow, after working on this a bit more, I came up with a few
> implementations of search algorithms doing just what I needed with the
> same interface available using bazaar and launchpad at
> http://launchpad.net/~astraw/+junk/fastsearch<http://launchpad.net/%7Eastraw/+junk/fastsearch>(MIT license). I have
> attached the output of the plot_comparisons.py benchmarking script to
> this email (note that this benchmarking is pretty crude).
> For the problem I originally wrote about, I get what a nearly
> unbelievable speedup of ~250x using the
> fastsearch.downsamp.DownSampledPreSearcher class, which is very similar
> in spirit to Charles' suggestion. It takes 1000 values from the original
> array to create a new first-level array that is itself localized in
> memory and points to a more localized region of the full original array.
> Also, I get a similar (though slightly slower) result using AVL trees
> using the fastsearch.avlsearch.AvlSearcher class, which uses pyavl (
> http://sourceforge.net/projects/pyavl ).
> Using the benchmarking code included in the bzr branch, I don't get
> anything like this speedup (e.g. the attached figure), so I'm not sure
> exactly what's going on at this point, but I'm not going to argue with a
> 250x speedup, so the fastsearch.downsamp code is now being put to use in
> one of my projects.
> Stefan -- I think your code simply implements the classic binary search
> -- I don't see how it will reduce cache misses.
> Anyhow, perhaps someone will find the above useful. I guess it would
> still be a substantial amount of work to make a numpy-types-aware
> implementation of AVL trees or similar algorithms. These sorts of binary
> search trees seem like the right way to solve this problem and thus
> there might be an interesting project in this. I imagine that a
> numpy-types-aware Cython might make such implementation significantly
> easier and still blazingly fast compared to the binary search
> implemented in searchsorted() given today's cached memory architectures.
That's pretty amazing, but I don't understand the graph. The DownSampled
search looks like the worst. Are the curves mislabled? Are the axis correct?
I'm assuming smaller is better here.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Numpy-discussion