[Numpy-discussion] Optimizing reduction loops (sum(), prod(), et al.)

David Cournapeau
Thu Jul 9 03:28:06 CDT 2009

Matthieu Brucher wrote:
> Unfortunately, this is not possible. We've been playing with blocking
> loops for a long time in finite difference schemes, and it is always
> compiler dependent

You mean CPU dependent, right ? I can't see how a reasonable optimizing
compiler could make a big difference on cache effects ?

@ Pauli: if (optionally) knowing a few cache info would help you, I
could implement it. It should not be too difficult for most cases we
care about,


