# [Scipy-tickets] [SciPy] #1247: _kmeans chokes on large thresholds

SciPy Trac scipy-tickets@scipy....
Sat Jul 24 12:35:21 CDT 2010

```#1247: _kmeans chokes on large thresholds
---------------------------+------------------------------------------------
Reporter:  kwgoodman      |       Owner:  somebody
Type:  defect         |      Status:  new
Priority:  normal         |   Milestone:  0.9.0
Component:  scipy.cluster  |     Version:  0.7.0
Keywords:                 |
---------------------------+------------------------------------------------
_kmeans chokes on large thresholds:
{{{
>> from scipy import cluster
>> v = np.array([1,2,3,4,10], dtype=float)
>> cluster.vq.kmeans(v, 1, thresh=1e15)
(array([ 4.]), 2.3999999999999999)
>> cluster.vq.kmeans(v, 1, thresh=1e16)
<snip>
IndexError: list index out of range
}}}
The problem is in these lines:
{{{
diff = thresh+1.
while diff > thresh:
<snip>
if(diff > thresh):
}}}
If thresh is large then (thresh + 1) > thresh is False:
{{{
>> thresh = 1e16
>> diff = thresh + 1.0
>> diff > thresh
False
}}}
What's a use case for a large threshold? You might want to study the
algorithm by seeing the result after one iteration (not to be confused
with the iter input which is something else).

One fix is to use 2*thresh instead for thresh + 1. But that just pushes
the problem out to higher thresholds:
{{{
>> thresh = 1e16
>> diff = 2 * thresh
>> diff > thresh
True

>> thresh = 1e400
>> diff = 2 * thresh
>> diff > thresh
False
}}}
A better fix is to replace:
{{{
if dist > thresh
}}}
with
{{{
if (dist > thresh) or (count = 0)
}}}
or
{{{
if (dist > thresh) or firstflag
}}}

--
Ticket URL: <http://projects.scipy.org/scipy/ticket/1247>
SciPy <http://www.scipy.org>
SciPy is open-source software for mathematics, science, and engineering.
```