[Numpy-discussion] Optimizing reduction loops (sum(), prod(), et al.)

David Cournapeau cournape@gmail....
Wed Jul 8 18:20:43 CDT 2009

On Thu, Jul 9, 2009 at 8:02 AM, Pauli Virtanen<pav+sp@iki.fi> wrote:

> I don't think we want to go the ATNumPy route, or even have
> tunable parameters chosen at build or compile time.

Detecting things like cache size at compile time should not be too
difficult, at least for common platforms. Even detecting it at runtime
should be relatively simple in some particular cases (x86).

BTW, one good baseline for those summation is to use dot:

np.ones((80000, 256)).sum(axis=0) vs np.dot(np.ones((1, 80000)),
np.ones((80000, 256)))

Assuming dot uses an optimized blas, this is generally one order of
magnitude faster than sum.

> (Unless, of
> course, we want to bring a monster into the world -- think about
> cross-breeding distutils with the ATLAS build system :)

Kill me now :)


More information about the NumPy-Discussion mailing list