[Numpy-discussion] poor performance of sum with sub-machine-word integer types
Charles R Harris
Tue Jun 21 12:16:19 CDT 2011
On Tue, Jun 21, 2011 at 10:46 AM, Zachary Pincus <email@example.com>wrote:
> Hello all,
> As a result of the "fast greyscale conversion" thread, I noticed an anomaly
> with numpy.ndararray.sum(): summing along certain axes is much slower with
> sum() than versus doing it explicitly, but only with integer dtypes and when
> the size of the dtype is less than the machine word. I checked in 32-bit and
> 64-bit modes and in both cases only once the dtype got as large as that did
> the speed difference go away. See below...
> Is this something to do with numpy or something inexorable about machine /
> memory architecture?
It's because of the type conversion sum uses by default for greater
In : timeit i.sum(axis=-1)
10 loops, best of 3: 140 ms per loop
In : timeit i.sum(axis=-1, dtype=int8)
100 loops, best of 3: 16.2 ms per loop
If you have 1.6, einsum is faster but also conserves type:
In : timeit einsum('ijk->ij', i)
100 loops, best of 3: 5.95 ms per loop
We could probably make better loops for summing within kinds, i.e.,
accumulate in higher precision, then cast to specified precision.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion