[Scipy-svn] r4194 - trunk/scipy/cluster

scipy-svn@scip... scipy-svn@scip...
Sun Apr 27 08:29:08 CDT 2008


Author: damian.eads
Date: 2008-04-27 08:29:06 -0500 (Sun, 27 Apr 2008)
New Revision: 4194

Modified:
   trunk/scipy/cluster/vq.py
Log:
Tightened the language of the kmeans docstring more.

Modified: trunk/scipy/cluster/vq.py
===================================================================
--- trunk/scipy/cluster/vq.py	2008-04-27 13:18:37 UTC (rev 4193)
+++ trunk/scipy/cluster/vq.py	2008-04-27 13:29:06 UTC (rev 4194)
@@ -18,7 +18,7 @@
     step of the k-means algorithm refines the choices of centroids to
     reduce distortion. The change in distortion is often used as a
     stopping criterion: when the change is lower than a threshold, the
-    k-means algorithm is not making progress and terminates.
+    k-means algorithm is not making sufficient progress and terminates.
 
     Since vector quantization is a natural application for k-means,
     information theory terminology is often used.  The centroid index
@@ -391,31 +391,34 @@
     return code_book, avg_dist[-1]
 
 def kmeans(obs, k_or_guess, iter=20, thresh=1e-5):
-    """Performs k-means on a set of observations for a specified number of
-       iterations. This yields a code book mapping centroids to codes
+    """Performs k-means on a set of observation vectors forming k
+       clusters. This yields a code book mapping centroids to codes
        and vice versa. The k-means algorithm adjusts the centroids
-       until the change in distortion caused by quantizing the
-       observation is less than some threshold.
+       until the sufficient progress cannot be made, i.e. the change
+       in distortion since the last iteration is less than some
+       threshold.
 
     :Parameters:
         obs : ndarray
-            Each row of the M by N array is an observation.  The columns are the
-            "features" seen during each observation.  The features must be
-            whitened first with the whiten function.
+            Each row of the M by N array is an observation vector. The
+            columns are the features seen during each observation.
+            The features must be whitened first with the whiten
+            function.
+
         k_or_guess : int or ndarray
-            The number of centroids to generate. One code will be assigned
-            to each centroid, and it will be the row index in the code_book
-            matrix generated.
+            The number of centroids to generate. One code will be
+            assigned to each centroid, and it will be the row index in
+            the code_book matrix generated.
 
-            The initial k centroids will be chosen by randomly
-            selecting observations from the observation
-            matrix. Alternatively, passing a k by N array specifies
-            the initial values of the k means.
+            The initial k centroids are chosen by randomly selecting
+            observations from the observation matrix. Alternatively,
+            passing a k by N array specifies the initial values of the
+            k centroids.
 
         iter : int
             The number of times to run k-means, returning the codebook
             with the lowest distortion. This argument is ignored if
-            initial mean values are specified with an array for the
+            initial centroids are specified with an array for the
             k_or_guess paramter. This parameter does not represent the
             number of iterations of the k-means algorithm.
 
@@ -436,8 +439,9 @@
            centroids generated.
 
     :SeeAlso:
-        - kmeans2: similar function, but with more options for initialization,
-          and returns label of each observation
+        - kmeans2: a different implementation of k-means clustering
+          with more methods for generating initial centroids but without
+          using the distortion change threshold as a stopping criterion.
         - whiten: must be called prior to passing an observation matrix
           to kmeans.
 



More information about the Scipy-svn mailing list