[SciPy-user] Kmeans help and C source

Costas Malamas costasm at hotmail.com
Mon Jan 7 17:50:48 CST 2002

Hi all,

I need to use a modified K-means algorithm for a project and I was delighted 
to discover that SciPy includes a python wrapper for a kmeans() function.

However, I am not quite following the kmeans() functionality (I am new to 
this clustering business, so this maybe a stupid newbie question): my docs 
tell me that kmeans should partition a dataset into k clusters.  So, I 
expect vq.kmeans(dataset, 2) to return to me dataset split up into two 
"equivalent" datasets.  However, anyway I feed my data into vq.kmeans() this 
doesn't happen (e.g. I feed it a 5x4 dataset and I get back two 5x1 
vectors).  My guess is that either this vq.kmeans() does something different 
--I confess to not understanding the docstring as the observation/codebook 
terminology has no parallel to the docs I've read-- or that I am not doing 
something right.  Any pointers? Even some documentation on the algorithm 
would be great help.

Secondly, as I mentioned above, I need a modified kmeans.  However, I see no 
C/Fortran code in the src tarball or CVS that seems related to kmeans.  Is 
the base code available?  If so, is it hackable by a SWIG newbie? (I am 
aware of SWIG, but I have never used it for anything serious).

Any and all info will be greatly appreciated :-) --and thanks for SciPy!

Costas Malamas

