[Numpy-discussion] searchsorted() and memory cache
Wed May 14 21:50:01 CDT 2008
Aha, I've found the problem -- my values were int64 and my keys were
uint64. Switching to the same data type immediately fixes the issue!
It's not a memory cache issue at all.
Perhaps searchsorted() should emit a warning if the keys require
casting... I can't believe how bad the hit was.
Charles R Harris wrote:
> On Wed, May 14, 2008 at 2:00 PM, Andrew Straw <email@example.com
> <mailto:firstname.lastname@example.org>> wrote:
> Charles R Harris wrote:
> > On Wed, May 14, 2008 at 8:09 AM, Andrew Straw
> <email@example.com <mailto:firstname.lastname@example.org>
> > <mailto:email@example.com <mailto:firstname.lastname@example.org>>> wrote:
> > Quite a difference (a factor of about 3000)! At this point,
> I haven't
> > delved into the dataset to see what makes it so pathological --
> > performance is nowhere near this bad for the binary search
> > with other sets of keys.
> > It can't be that bad Andrew, something else is going on. And 191 MB
> > isn's *that* big, I expect it should bit in memory with no problem.
> I agree the performance difference seems beyond what one would expect
> due to cache misses alone. I'm at a loss to propose other
> though. Ideas?
> I just searched for 2**25/10 keys in a 2**25 array of reals. It took
> less than a second when vectorized. In a python loop it took about 7.7
> seconds. The only thing I can think of is that the search isn't
> getting any cpu cycles for some reason. How much memory is it using?
> Do you have any nans and such in the data?
More information about the Numpy-discussion