[Numpy-discussion] Optimizing reduction loops (sum(), prod(), et al.)

Stéfan van der Walt stefan@sun.ac...
Wed Jul 8 17:38:49 CDT 2009

Hi Pauli

2009/7/9 Pauli Virtanen <pav+sp@iki.fi>:
> Unfortunately, improving the performance using the above scheme
> comes at the cost of some slightly murky heuristics.  I didn't
> manage to come up with an optimal decision rule, so they are
> partly empirical. There is one parameter tuning the cross-over
> between minimizing stride and avoiding small dimensions. (This is
> more or less straightforward.)  Another empirical decision is
> required in choosing whether to use the usual reduction loop,
> which is better in some cases, or the blocked loop. How to make
> this latter choice is not so clear to me.

I know very little about cache optimality, so excuse the triviality of
this question: Is it possible to design this loop optimally (taking
into account certain build-time measurable parameters), or is it the
kind of thing that can only be discovered by tuning at compile-time?
ATNumPy... scary :-)


More information about the NumPy-Discussion mailing list