[Scipy-tickets] [SciPy] #1247: _kmeans chokes on large thresholds

SciPy Trac scipy-tickets@scipy....
Sat Jul 24 12:35:21 CDT 2010


#1247: _kmeans chokes on large thresholds
---------------------------+------------------------------------------------
 Reporter:  kwgoodman      |       Owner:  somebody
     Type:  defect         |      Status:  new     
 Priority:  normal         |   Milestone:  0.9.0   
Component:  scipy.cluster  |     Version:  0.7.0   
 Keywords:                 |  
---------------------------+------------------------------------------------
 _kmeans chokes on large thresholds:
 {{{
 >> from scipy import cluster
 >> v = np.array([1,2,3,4,10], dtype=float)
 >> cluster.vq.kmeans(v, 1, thresh=1e15)
    (array([ 4.]), 2.3999999999999999)
 >> cluster.vq.kmeans(v, 1, thresh=1e16)
 <snip>
 IndexError: list index out of range
 }}}
 The problem is in these lines:
 {{{
     diff = thresh+1.
     while diff > thresh:
         <snip>
         if(diff > thresh):
 }}}
 If thresh is large then (thresh + 1) > thresh is False:
 {{{
 >> thresh = 1e16
 >> diff = thresh + 1.0
 >> diff > thresh
    False
 }}}
 What's a use case for a large threshold? You might want to study the
 algorithm by seeing the result after one iteration (not to be confused
 with the iter input which is something else).

 One fix is to use 2*thresh instead for thresh + 1. But that just pushes
 the problem out to higher thresholds:
 {{{
 >> thresh = 1e16
 >> diff = 2 * thresh
 >> diff > thresh
    True

 >> thresh = 1e400
 >> diff = 2 * thresh
 >> diff > thresh
    False
 }}}
 A better fix is to replace:
 {{{
 if dist > thresh
 }}}
 with
 {{{
 if (dist > thresh) or (count = 0)
 }}}
 or
 {{{
 if (dist > thresh) or firstflag
 }}}

-- 
Ticket URL: <http://projects.scipy.org/scipy/ticket/1247>
SciPy <http://www.scipy.org>
SciPy is open-source software for mathematics, science, and engineering.


More information about the Scipy-tickets mailing list