# [Scipy-tickets] [SciPy] #724: distance function, som and co

SciPy scipy-tickets@scipy....
Sat Aug 23 12:53:30 CDT 2008

```#724: distance function, som and co
---------------------------+------------------------------------------------
Reporter:  cdavid         |        Owner:  somebody
Priority:  high           |    Milestone:  0.7.0
Component:  scipy.cluster  |      Version:
Severity:  normal         |   Resolution:
Keywords:                 |
---------------------------+------------------------------------------------
Comment (by cwebster):

In the discussion yesterday the question was whether argmin can be used to
create an efficient pure numpy version of the vq algorithm.  Eric Jones
looked into this a while back and some of his slides discuss this issue.

We have the following options:
1. {{{scipy.cluster.vq.vq}}} a pure C implementation
{{{
def vq(obs, codebook):
matches = []
for v in obs:
matches.append(argmin(sum((codebook-v)*(codebook-v), axis=-1)))
return matches
}}}
{{{
from numpy import argmin, sum, array, newaxis

def vq(obs, codebook):
return argmin(sum((obs[newaxis,:,:]-codebook[:,newaxis,:])**2,
axis=-1), axis=0)
}}}

On OS X, 2.3 GHz Macbook Pro, 2 GB memory.

For a codebook which is 10 x 100, obs is 100 x 100, we have the following
times
1. 1000 loops, best of 3: 310 µs per loop
2. 10 loops, best of 3: 4.57 ms per loop
3. 100 loops, best of 3: 2.62 ms per loop

For a codebook which is 100 x 1000, obs is 1000 x 1000, we have the
following times
1. 10 loops, best of 3: 154 ms per loop
2. 10 loops, best of 3: 4.31 s per loop
3. 10 loops, best of 3: 3.33 s per loop (this allocates ~100 MB)

So we are looking at a 10 times slowdown for Python in small cases, and a
20+ times slowdown for large cases.

Conclusion: there is a significant win if we can make different distances
work in C.

--
Ticket URL: <http://scipy.org/scipy/scipy/ticket/724#comment:1>
SciPy <http://www.scipy.org/>
SciPy is open-source software for mathematics, science, and engineering.
```