[SciPy-User] kmeans
Benjamin Root
ben.root@ou....
Fri Jul 23 20:56:53 CDT 2010
On Fri, Jul 23, 2010 at 7:53 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
> On Fri, Jul 23, 2010 at 5:46 PM, Benjamin Root <ben.root@ou.edu> wrote:
> > On Fri, Jul 23, 2010 at 6:48 PM, Keith Goodman <kwgoodman@gmail.com>
> wrote:
> >>
> >> On Fri, Jul 23, 2010 at 4:00 PM, Benjamin Root <ben.root@ou.edu> wrote:
> >>
> >> > The stopping condition uses the change in the distortion, not a
> >> > non-squared
> >> > distance. The distortion is already a sum of squares. The only place
> >> > that
> >> > a non-squared distance is used is in _py_vq_1d() which appears to be
> >> > very
> >> > old code and it has a raise error at the very first statement.
> >>
> >> That's good news.
> >>
> >> Another place that a non-squared distance is used is the return value:
> >>
> >> >> import numpy as np
> >> >> from scipy import cluster
> >> >> v = np.array([1,2,3,4,10],dtype=float)
> >> >> cluster.vq.kmeans(v, 1)
> >> (array([ 4.]), 2.3999999999999999)
> >>
> >> >> np.sqrt(np.dot(v-4, v-4) / 5.0)
> >> 3.1622776601683795 # Nope, not returned
> >> >> np.absolute(v - 4).mean()
> >> 2.3999999999999999 # Yep, this one is returned
> >>
> >> Is that a code bug or a doc bug?
> >
> > Well, see, that's just the thing... the doc says that it returns the
> > distortion, which is what it does, but obviously, this distortion was a
> MAE
> > and not a RMSE. The problem is that I have gone backwards and forwards
> over
> > the codes, including the Cython version, and I can't find anyplace where
> > this is happening.
> >
> > Does anybody know of any good code tracing tools? I used trace once, but
> it
> > wasn't very user-friendly...
>
> I think I see it! Yes, the squared distance is calculated. But before
> it is summed or meaned, the square root is taken. That turns the
> squared distance into just distance.
>
Are you talking about the sqrt in py_vq()? That doesn't get called in the
given example... however, you are right that the list of distances that is
being returned are being square-rooted before the return. It is happening
in the C code, though, and I just don't know where...
Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20100723/d94e0d8c/attachment.html
More information about the SciPy-User
mailing list