[SciPy-User] kmeans

Benjamin Root ben.root@ou....
Fri Jul 23 12:36:49 CDT 2010


On Fri, Jul 23, 2010 at 12:27 PM, David Cournapeau <cournape@gmail.com>wrote:

> On Sat, Jul 24, 2010 at 2:19 AM, Benjamin Root <ben.root@ou.edu> wrote:
>
> >
> > Examining further, I see that SciPy's implementation is fairly simplistic
> > and has some issues.  In the given example, the reason why 3 is never
> > returned is not because of the use of the distortion metric, but rather
> > because the kmeans function never sees the distance for using 3.  As a
> > matter of fact, the actual code that does the convergence is in vq and
> py_vq
> > (vector quantization) and it tries to minimize the sum of squared errors.
> > kmeans just keeps on retrying the convergence with random guesses to see
> if
> > different convergences occur.
>
> As one of the maintainer of kmeans, I would be the first to admit the
> code is basic, for good and bad. Something more elaborate for
> clustering may indeed be useful, as long as the interface stays
> simple.
>
> More complex needs should turn on scikits.learn or more specialized
> packages,
>
> cheers,
>
> David
>

I agree, kmeans does not need to get very complicated because kmeans (the
general concept) is not very suitable for very complicated situations.

As a thought, a possible way to help out the current implementation is to
ensure that unique guesses are made.  Currently, several iterations are
wasted by performing guesses that it has already done before.  Is there a
way to do sampling without replacement in numpy.random?

Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20100723/38e54184/attachment.html 


More information about the SciPy-User mailing list