# [SciPy-User] kmeans

Keith Goodman kwgoodman@gmail....
Sat Jul 24 12:36:17 CDT 2010

```_kmeans chokes on large thresholds:

>> from scipy import cluster
>> v = np.array([1,2,3,4,10], dtype=float)
>> cluster.vq.kmeans(v, 1, thresh=1e15)
(array([ 4.]), 2.3999999999999999)
>> cluster.vq.kmeans(v, 1, thresh=1e16)
<snip>
IndexError: list index out of range

The problem is in these lines:

diff = thresh+1.
while diff > thresh:
<snip>
if(diff > thresh):

If thresh is large then (thresh + 1) > thresh is False:

>> thresh = 1e16
>> diff = thresh + 1.0
>> diff > thresh
False

What's a use case for a large threshold? You might want to study the
algorithm by seeing the result after one iteration (not to be confused
with the iter input which is something else).

One fix is to use 2*thresh instead for thresh + 1. But that just
pushes the problem out to higher thresholds:

>> thresh = 1e16
>> diff = 2 * thresh
>> diff > thresh
True

>> thresh = 1e400
>> diff = 2 * thresh
>> diff > thresh
False

A better fix is to replace:

if dist > thresh

with

if (dist > thresh) or (count = 0)

or

if (dist > thresh) or firstflag

Ticket: http://projects.scipy.org/scipy/ticket/1247
```

More information about the SciPy-User mailing list