[SciPy-User] kmeans

Keith Goodman kwgoodman@gmail....
Thu Jul 22 14:51:15 CDT 2010


On Thu, Jul 22, 2010 at 12:31 PM, alex <argriffi@ncsu.edu> wrote:
> On Thu, Jul 22, 2010 at 3:15 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
> <snip>
>>
>> You'd like to minimize the squared error (I don't know much about it
>> but that makes sense to me). But in the example you chose, the squared
>> error is minimized since the mean is 4. Was that just a coincidence? I
>> guess in the end the code is protected against any claims of bugs
>> since it doesn't guarantee to find the global minimum :)
>
> This was not really a coincidence, because the algorithm converges to a
> local minimum of sum of squared distances.  This is why I was suggesting
> using this sum of squared distances as a stopping criterion and returning
> this value instead of the distortion.  Or alternatively we could use the
> k-means code Benjamin mentioned if he digs it up and if it allows multiple
> distance functions and has a reasonable stopping criterion.

OK, thank you, I think I get it. It minimizes one measure (squared
distance) but it uses another measure (distance) for stopping. Another
plus for squared distance is that it is faster to calculate then mean
distance, dot(dist, dist) versus mean(dist) or dot(dist, one).


More information about the SciPy-User mailing list