[Scipy-tickets] [SciPy] #724: distance function, som and co
SciPy
scipy-tickets@scipy....
Sat Aug 23 12:53:30 CDT 2008
#724: distance function, som and co
---------------------------+------------------------------------------------
Reporter: cdavid | Owner: somebody
Type: task | Status: new
Priority: high | Milestone: 0.7.0
Component: scipy.cluster | Version:
Severity: normal | Resolution:
Keywords: |
---------------------------+------------------------------------------------
Comment (by cwebster):
In the discussion yesterday the question was whether argmin can be used to
create an efficient pure numpy version of the vq algorithm. Eric Jones
looked into this a while back and some of his slides discuss this issue.
We have the following options:
1. {{{scipy.cluster.vq.vq}}} a pure C implementation
2. Brain-dead Python loop:
{{{
def vq(obs, codebook):
matches = []
for v in obs:
matches.append(argmin(sum((codebook-v)*(codebook-v), axis=-1)))
return matches
}}}
2. Massive broadcasting:
{{{
from numpy import argmin, sum, array, newaxis
def vq(obs, codebook):
return argmin(sum((obs[newaxis,:,:]-codebook[:,newaxis,:])**2,
axis=-1), axis=0)
}}}
On OS X, 2.3 GHz Macbook Pro, 2 GB memory.
For a codebook which is 10 x 100, obs is 100 x 100, we have the following
times
1. 1000 loops, best of 3: 310 µs per loop
2. 10 loops, best of 3: 4.57 ms per loop
3. 100 loops, best of 3: 2.62 ms per loop
For a codebook which is 100 x 1000, obs is 1000 x 1000, we have the
following times
1. 10 loops, best of 3: 154 ms per loop
2. 10 loops, best of 3: 4.31 s per loop
3. 10 loops, best of 3: 3.33 s per loop (this allocates ~100 MB)
So we are looking at a 10 times slowdown for Python in small cases, and a
20+ times slowdown for large cases.
Conclusion: there is a significant win if we can make different distances
work in C.
--
Ticket URL: <http://scipy.org/scipy/scipy/ticket/724#comment:1>
SciPy <http://www.scipy.org/>
SciPy is open-source software for mathematics, science, and engineering.
More information about the Scipy-tickets
mailing list