# [SciPy-User] kmeans

Benjamin Root ben.root@ou....
Sun Jul 25 14:48:52 CDT 2010

```On Sun, Jul 25, 2010 at 2:41 PM, David Cournapeau <cournape@gmail.com>wrote:

> On Sun, Jul 25, 2010 at 2:36 AM, Keith Goodman <kwgoodman@gmail.com>
> wrote:
> > _kmeans chokes on large thresholds:
> >
> >>> from scipy import cluster
> >>> v = np.array([1,2,3,4,10], dtype=float)
> >>> cluster.vq.kmeans(v, 1, thresh=1e15)
> >   (array([ 4.]), 2.3999999999999999)
> >>> cluster.vq.kmeans(v, 1, thresh=1e16)
> > <snip>
> > IndexError: list index out of range
> >
> > The problem is in these lines:
> >
> >    diff = thresh+1.
> >    while diff > thresh:
> >        <snip>
> >        if(diff > thresh):
> >
> > If thresh is large then (thresh + 1) > thresh is False:
> >
> >>> thresh = 1e16
> >>> diff = thresh + 1.0
> >>> diff > thresh
> >   False
> >
> > What's a use case for a large threshold? You might want to study the
> > algorithm by seeing the result after one iteration (not to be confused
> > with the iter input which is something else).
> >
> > One fix is to use 2*thresh instead for thresh + 1. But that just
> > pushes the problem out to higher thresholds
>
> Or just use the spacing function, which by definition returns the
> smallest number M such as thresh + M > thresh (except for nan/inf)
>
>
Or, one could just go with a "prime the loop" approach and perform the
operation once before the loop begins.  Admittedly, this does seem rather
un-pythonic unless python has a do...while idiom that I am unaware of.

Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20100725/22a76cb6/attachment.html
```