# [Numpy-discussion] Vectorizing code, for loops, and all that

Albert Strasheim 13640887 at sun.ac.za
Mon Oct 2 17:10:33 CDT 2006

```Hello all

I recently started looking at David Cournapeau's PyEM package, specifically
his implementation of the K-Means algorithm. He implemented part of this
algorithm with in pure Python version and also provided a Pyrex alternative
that is significantly faster (about 10 times with the data I tested with). I
tried to figure out why this is so.

The job of this part of the algorithm is pretty simple: from a bunch of
cluster means (the codebook) find the nearest cluster mean for each data
point. The first attempt at implementing this algorithm might use two for
loops, one over the data points and one over the cluster means, computing a
Euclidean distance at each step. Slightly better is to use one loop and some

By randomly trying some code, I arrived at the following one-liner:

N.sum((data[...,N.newaxis,...]-code)**2, 2).argmin(axis=1)

where

data = N.random.randn(2000, 39)
nclusters = 64
code = data[0:nclusters,:]

This code was still slow, so I mangled it a bit so that I could profile it
using cProfile under Python 2.5 (use profile if you have an older version of
Python):

def kmean():
data = N.random.randn(2000, 39)
nclusters = 64
code = data[0:nclusters,:]
for i in xrange(10):
def a(): return data[...,N.newaxis,...]
z = a()
def b(): return z-code
z = b()
def c(): return z**2
z = c()
def d(): return N.sum(z, 2)
z = d()
def e(): return z.argmin(axis=1)
z = e()

if __name__ == '__main__':
import cProfile
cProfile.run("kmean()")

Profiler output:

In [570]: %run -t -e kmean2.py
84 function calls in 8.567 CPU seconds

Ordered by: standard name

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
1    0.000    0.000    8.567    8.567 <string>:1(<module>)
10    0.000    0.000    0.835    0.084 fromnumeric.py:375(sum)
10    0.000    0.000    0.000    0.000 kmean2.py:10(a)
10    6.837    0.684    6.837    0.684 kmean2.py:12(b)
10    0.623    0.062    0.623    0.062 kmean2.py:14(c)
10    0.000    0.000    0.835    0.084 kmean2.py:16(d)
10    0.080    0.008    0.080    0.008 kmean2.py:18(e)
1    0.180    0.180    8.567    8.567 kmean2.py:5(kmean2)
10    0.000    0.000    0.000    0.000 {isinstance}
1    0.000    0.000    0.000    0.000 {method 'disable' of
'_lsprof.Profiler' objects}
1    0.012    0.012    0.012    0.012 {method 'randn' of
'mtrand.RandomState' objects}
10    0.835    0.083    0.835    0.083 {method 'sum' of
'numpy.ndarray' objects}

It seems the b part of the computation is taking most of the time. I tried
to simulate this separately:

In [571]: x1 = N.random.randn(2000,39)

In [572]: y1 = N.random.randn(64,39)

In [574]: %timeit z1=x1[...,N.newaxis,...]-y1 10 loops, best of 3: 703 ms
per loop

In [575]: z1.shape
Out[575]: (2000, 64, 39)

As far as I can figure, this operation is doing 2000*64*39 subtractions.
Doing this straight up yields the following:

In [576]: x2 = N.random.randn(2000,64,39)

In [577]: y2 = N.random.randn(2000,64,39)

In [578]: %timeit z2 = x2-y2
10 loops, best of 3: 108 ms per loop

Does anybody have any ideas on why this is so much faster? Hopefully I
didn't mess up somewhere...

Thanks!

Regards,

Albert

P.S. I'm used NumPy 1.0rc1 with Python 2.5 on Windows for these tests. I get
similar results with 1.0.dev3239 with Python 2.4.3 on Linux.

P.P.S. SourceForge and GMail *really* don't like each other right now.

```