[Numpy-discussion] searchsorted() and memory cache

Andrew Straw strawman@astraw....
Wed May 14 21:50:01 CDT 2008


Aha, I've found the problem -- my values were int64 and my keys were
uint64. Switching to the same data type immediately fixes the issue!
It's not a memory cache issue at all.

Perhaps searchsorted() should emit a warning if the keys require
casting... I can't believe how bad the hit was.

-Andrew

Charles R Harris wrote:
>
>
> On Wed, May 14, 2008 at 2:00 PM, Andrew Straw <strawman@astraw.com
> <mailto:strawman@astraw.com>> wrote:
>
>     Charles R Harris wrote:
>     >
>     >
>     > On Wed, May 14, 2008 at 8:09 AM, Andrew Straw
>     <strawman@astraw.com <mailto:strawman@astraw.com>
>     > <mailto:strawman@astraw.com <mailto:strawman@astraw.com>>> wrote:
>     >
>     >
>     >
>     >     Quite a difference (a factor of about 3000)! At this point,
>     I haven't
>     >     delved into the dataset to see what makes it so pathological --
>     >     performance is nowhere near this bad for the binary search
>     algorithm
>     >     with other sets of keys.
>     >
>     >
>     > It can't be that bad Andrew, something else is going on. And 191 MB
>     > isn's *that* big, I expect it should bit in memory with no problem.
>     I agree the performance difference seems beyond what one would expect
>     due to cache misses alone. I'm at a loss to propose other
>     explanations,
>     though. Ideas?
>
>
> I just searched for  2**25/10 keys in a 2**25 array of reals. It took
> less than a second when vectorized. In a python loop it took about 7.7
> seconds. The only thing I can think of is that the search isn't
> getting any cpu cycles for some reason. How much memory is it using?
> Do you have any nans and such in the data?



More information about the Numpy-discussion mailing list