[Scipy-tickets] [SciPy] #724: distance function, som and co

SciPy scipy-tickets@scipy....
Sat Aug 23 12:53:30 CDT 2008


#724: distance function, som and co
---------------------------+------------------------------------------------
 Reporter:  cdavid         |        Owner:  somebody
     Type:  task           |       Status:  new     
 Priority:  high           |    Milestone:  0.7.0   
Component:  scipy.cluster  |      Version:          
 Severity:  normal         |   Resolution:          
 Keywords:                 |  
---------------------------+------------------------------------------------
Comment (by cwebster):

 In the discussion yesterday the question was whether argmin can be used to
 create an efficient pure numpy version of the vq algorithm.  Eric Jones
 looked into this a while back and some of his slides discuss this issue.

 We have the following options:
   1. {{{scipy.cluster.vq.vq}}} a pure C implementation
   2. Brain-dead Python loop:
     {{{
 def vq(obs, codebook):
     matches = []
     for v in obs:
         matches.append(argmin(sum((codebook-v)*(codebook-v), axis=-1)))
     return matches
     }}}
   2. Massive broadcasting:
     {{{
 from numpy import argmin, sum, array, newaxis

 def vq(obs, codebook):
     return argmin(sum((obs[newaxis,:,:]-codebook[:,newaxis,:])**2,
 axis=-1), axis=0)
     }}}

 On OS X, 2.3 GHz Macbook Pro, 2 GB memory.

 For a codebook which is 10 x 100, obs is 100 x 100, we have the following
 times
   1. 1000 loops, best of 3: 310 µs per loop
   2. 10 loops, best of 3: 4.57 ms per loop
   3. 100 loops, best of 3: 2.62 ms per loop

 For a codebook which is 100 x 1000, obs is 1000 x 1000, we have the
 following times
   1. 10 loops, best of 3: 154 ms per loop
   2. 10 loops, best of 3: 4.31 s per loop
   3. 10 loops, best of 3: 3.33 s per loop (this allocates ~100 MB)

 So we are looking at a 10 times slowdown for Python in small cases, and a
 20+ times slowdown for large cases.

 Conclusion: there is a significant win if we can make different distances
 work in C.

-- 
Ticket URL: <http://scipy.org/scipy/scipy/ticket/724#comment:1>
SciPy <http://www.scipy.org/>
SciPy is open-source software for mathematics, science, and engineering.


More information about the Scipy-tickets mailing list