[Numpy-discussion] cumsum much slower than simple loop?

Pauli Virtanen pav@iki...
Fri Feb 10 02:55:26 CST 2012

10.02.2012 05:39, Dave Cook kirjoitti:
> Why is numpy.cumsum (along axis=0) so much slower than a simple loop? 
> The same goes for numpy.add.accumulate

The reason is loop ordering. The reduction operator when using `cumsum`
or `add.reduce` does the summation in the inmost loop, whereas the
`loopcumsum` has the summation in the outmost loop.

Although both algorithms do the same number of operations, the latter is
more efficient with regards to CPU cache (and maybe memory data
dependency) --- the arrays are in C-order so summing along the first
axis is wasteful as the elements are far from each other in memory.

The effect goes away, if you use a Fortran-ordered array:

    a = np.array(a, order='F')
    print a.shape

Numpy does not currently have heuristics to determine when swapping the
loop order would be beneficial in accumulation and reductions. It does,
however, have the heuristics in place for elementwise operations.

Pauli Virtanen

More information about the NumPy-Discussion mailing list