[Numpy-discussion] Vectorizing code, for loops, and all that

David Cournapeau david at ar.media.kyoto-u.ac.jp
Tue Oct 3 02:33:01 CDT 2006

Albert Strasheim wrote:
> Hello all
> I recently started looking at David Cournapeau's PyEM package, specifically
> his implementation of the K-Means algorithm. He implemented part of this
> algorithm with in pure Python version and also provided a Pyrex alternative
> that is significantly faster (about 10 times with the data I tested with). I
> tried to figure out why this is so.
For what it worths, PyEM has been quietly updated in the past weeks: 
most responses I got initially were people reporting errors almost 
always related to change in numpy API changes, so I decided to keep 
quiet for a while, waiting for an API freeze on numpy side (last 
versions use vq from scipy.cluster, for example, the plotting have been 
much improved, a fast Ctype version for diagonal gaussian kernel 
enables  to  run EM on  hundred of thousand frames of several dozens 
dimensions  with several dozens of gaussian in the mixture in reasonable 
> The job of this part of the algorithm is pretty simple: from a bunch of
> cluster means (the codebook) find the nearest cluster mean for each data
> point. The first attempt at implementing this algorithm might use two for
> loops, one over the data points and one over the cluster means, computing a
> Euclidean distance at each step. Slightly better is to use one loop and some
> broadcasting.
> By randomly trying some code, I arrived at the following one-liner:
> N.sum((data[...,N.newaxis,...]-code)**2, 2).argmin(axis=1)
> where
> data = N.random.randn(2000, 39)
> nclusters = 64
> code = data[0:nclusters,:]
> This code was still slow, so I mangled it a bit so that I could profile it
> using cProfile under Python 2.5 (use profile if you have an older version of
> Python):
> def kmean():
>     data = N.random.randn(2000, 39)
>     nclusters = 64
>     code = data[0:nclusters,:]
>     for i in xrange(10):
>         def a(): return data[...,N.newaxis,...]
>         z = a()
>         def b(): return z-code
>         z = b()
>         def c(): return z**2
>         z = c()
>         def d(): return N.sum(z, 2)
>         z = d()
>         def e(): return z.argmin(axis=1)
>         z = e()
> if __name__ == '__main__':
>     import cProfile
>     cProfile.run("kmean()")
> Profiler output:
I got exactly the same kind of weird behaviour in other places of PyEM 
(apparently simple substractions taking a lot of time compared to other 
parts although it should theoratically be negligeable: eg X - mu where 
X  is (n, d) and mu (d, 1) takeing almost as much time as exp( X**2) !).

It would be great to at least know the origin of this non-intuitive result,


More information about the Numpy-discussion mailing list