Fri Jul 23 17:53:55 CDT 2010
On Fri, Jul 23, 2010 at 5:27 PM, Lutz Maibaum <email@example.com>wrote:
> On Jul 23, 2010, at 2:55 PM, Benjamin Root wrote:
> > On Fri, Jul 23, 2010 at 4:18 PM, Lutz Maibaum <firstname.lastname@example.org>
> >> Actually, it not entirely clear to me anymore what the bug is. According
> to the k-means Wikipedia page, the objective function that the algorithm
> tries to minimize is the total intra-cluster variance (the sum of squares of
> distances of data points from cluster centroids). However, the two steps of
> the iteration (assignment to centroids, and centroid update) use regular
> distances and means. Is this not what the current code is doing?
> > Which is why I have been saying that there is no bug here because the
> code is technically correct. A mean of regular distances is a sum of
> squared distances that has been divided. The only reason why the current
> code is not returning the correct answer for the given example is that it
> never tries 3 as a centroid value. This is a different issue.
> I apologize if I am being obtuse, but why do you think the current code
> does not return the correct answer?
> >>> import numpy as np
> >>> from scipy import cluster
> >>> v = np.array([1,2,3,4,10],dtype=float)
> >>> cluster.vq.kmeans(v, 1)
> (array([ 4.]), 2.3999999999999999)
> >>> np.sum([abs(x-4)**2 for x in v])
> >>> np.sum([abs(x-3)**2 for x in v])
> The centroid 4 minimizes the sum of squared distances, which is what kmeans
> is supposed to find.
Right, sorry, I forgot that we already figured that out. So, there is no
bug in this respect.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the SciPy-User