[SciPy-User] kmeans (Re: Mailing list?)

David Warde-Farley dwf@cs.toronto....
Mon Nov 23 12:30:57 CST 2009


On 23-Nov-09, at 2:18 AM, Simon Friedberger wrote:

> thanks for your explanation. I agree with your arguments but  
> couldn't it
> have the opposite effect: Weighing features that should have less
> discriminative power more because they have a small variance?
> I'm just not sure about it but I will check out the book you  
> reference.
> I've had it lying around for a while anyway.

It could, but typically when you're employing k-means, you have little  
reason to believe any of the variables have any more explanatory power  
than any of the others, so treating them "equally" is the simplest,  
most reasonable thing to do. It indeed will inflate the range of low  
variance.

You also use the word "discriminative", which makes me think you're  
trying to do some sort of classification. Note that k-means can't take  
into account any label information and is thus ill-suited to  
classification, though it is sometimes used for this.

> On the case of inverting the transformation. Is this functionality
> built-in? I can't find anything in the docs.

It isn't, but maybe it should be. It'd involve rethinking the cluster  
module a bit (which I've been planning on as a means to expand it, but  
oh, the time, where does it go?...).

David


More information about the SciPy-User mailing list