[Numpy-discussion] 2-d in-place operation performance vs 1-d non in-place
George Sakkis
george.sakkis@gmail....
Thu Sep 6 06:30:43 CDT 2007
On Sep 5, 12:29 pm, Francesc Altet <fal...@carabos.com> wrote:
> A Wednesday 05 September 2007, George Sakkis escrigué:
>
>
>
> > I was surprised to see that an in-place modification of a 2-d array
> > turns out to be slower from the respective non-mutating operation on
> > 1- d arrays, although the latter creates new array objects. Here is
> > the benchmarking code:
>
> > import timeit
>
> > for n in 10,100,1000,10000:
> > setup = 'from numpy.random import random;' \
> > 'm=random((%d,2));' \
> > 'u1=random(%d);' \
> > 'u2=u1.reshape((u1.size,1))' % (n,n)
> > timers = [timeit.Timer(stmt,setup) for stmt in
> > # 1-d operations; create new arrays
> > 'a0 = m[:,0]-u1; a1 = m[:,1]-u1',
> > # 2-d in place operation
> > 'm -= u2'
> > ]
> > print n, [min(timer.repeat(3,1000)) for timer in timers]
>
> > And some results (Python 2.5, WinXP):
>
> > 10 [0.010832382327921563, 0.0045706926438974782]
> > 100 [0.010882668048592767, 0.021704993232380093]
> > 1000 [0.018272154701226007, 0.19477587235249172]
> > 10000 [0.073787590322233698, 1.9234369172618306]
>
> > So the 2-d in-place modification time grows linearly with the array
> > size but the 1-d operations are much more efficient, despite
> > allocating new arrays while doing so. What gives ?
>
> This seems the effect of broadcasting u2. If you were to use a
> pre-computed broadcasted, you would get rid of such bottleneck:
>
> for n in 10,100,1000,10000:
> setup = 'import numpy;' \
> 'm=numpy.random.random((%d,2));' \
> 'u1=numpy.random.random(%d);' \
> 'u2=u1[:, numpy.newaxis];' \
> 'u3=numpy.array([u1,u1]).transpose()' % (n,n)
> timers = [timeit.Timer(stmt,setup) for stmt in
> # 1-d operations; create new arrays
> 'a0 = m[:,0]-u1; a1 = m[:,1]-u1',
> # 2-d in place operation (using broadcasting)
> 'm -= u2',
> # 2-d in-place operation (not forcing broadcasting)
> 'm -= u3'
> ]
> print n, [min(timer.repeat(3,1000)) for timer in timers]
>
> gives in my machine:
>
> 10 [0.03213191032409668, 0.012019872665405273, 0.0068600177764892578]
> 100 [0.033048152923583984, 0.06542205810546875, 0.0076580047607421875]
> 1000 [0.040294170379638672, 0.59892702102661133, 0.014600992202758789]
> 10000 [0.32667303085327148, 5.9721651077270508, 0.10261106491088867]
Thank you, indeed broadcasting is the bottleneck here. I had the
impression that broadcasting was a fast operation, i.e. it doesn't
require allocating physically the target array of the broadcast but it
seems this is not the case.
George
More information about the Numpy-discussion
mailing list