[Numpy-discussion] poor performance of sum with sub-machine-word integer types
Charles R Harris
charlesr.harris@gmail....
Tue Jun 21 12:16:19 CDT 2011
On Tue, Jun 21, 2011 at 10:46 AM, Zachary Pincus <zachary.pincus@yale.edu>wrote:
> Hello all,
>
> As a result of the "fast greyscale conversion" thread, I noticed an anomaly
> with numpy.ndararray.sum(): summing along certain axes is much slower with
> sum() than versus doing it explicitly, but only with integer dtypes and when
> the size of the dtype is less than the machine word. I checked in 32-bit and
> 64-bit modes and in both cases only once the dtype got as large as that did
> the speed difference go away. See below...
>
> Is this something to do with numpy or something inexorable about machine /
> memory architecture?
>
>
It's because of the type conversion sum uses by default for greater
precision.
In [8]: timeit i.sum(axis=-1)
10 loops, best of 3: 140 ms per loop
In [9]: timeit i.sum(axis=-1, dtype=int8)
100 loops, best of 3: 16.2 ms per loop
If you have 1.6, einsum is faster but also conserves type:
In [10]: timeit einsum('ijk->ij', i)
100 loops, best of 3: 5.95 ms per loop
We could probably make better loops for summing within kinds, i.e.,
accumulate in higher precision, then cast to specified precision.
<snip>
Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20110621/daa1be55/attachment.html
More information about the NumPy-Discussion
mailing list