# [Scipy-svn] r5170 - in trunk/scipy: cluster spatial

scipy-svn@scip... scipy-svn@scip...
Sat Nov 22 14:18:16 CST 2008

Author: damian.eads
Date: 2008-11-22 14:18:12 -0600 (Sat, 22 Nov 2008)
New Revision: 5170

Modified:
trunk/scipy/cluster/hierarchy.py
trunk/scipy/spatial/distance.py
Log:
Word-smithing some hierarchy and spatial documentation.

Modified: trunk/scipy/cluster/hierarchy.py
===================================================================
--- trunk/scipy/cluster/hierarchy.py	2008-11-22 19:09:35 UTC (rev 5169)
+++ trunk/scipy/cluster/hierarchy.py	2008-11-22 20:18:12 UTC (rev 5170)
@@ -331,6 +331,20 @@
Performs centroid/UPGMC linkage. See linkage for more
information on the return structure and algorithm.

+    The following are common calling conventions:
+
+    1. Z = centroid(y)
+
+       Performs centroid/UPGMC linkage on the condensed distance
+       matrix y.  See linkage for more information on the return
+       structure and algorithm.
+
+    2. Z = centroid(X)
+
+       Performs centroid/UPGMC linkage on the observation matrix X
+       using Euclidean distance as the distance metric. See linkage
+
:Parameters:
Q : ndarray
A condensed or redundant distance matrix. A condensed
@@ -346,21 +360,6 @@
the linkage function documentation for more information
on its structure.

-    Calling Conventions
-    -------------------
-
-    1. Z = centroid(y)
-
-       Performs centroid/UPGMC linkage on the condensed distance
-       matrix y.  See linkage for more information on the return
-       structure and algorithm.
-
-    2. Z = centroid(X)
-
-       Performs centroid/UPGMC linkage on the observation matrix X
-       using Euclidean distance as the distance metric. See linkage
-
:SeeAlso:
"""
@@ -371,18 +370,8 @@
Performs median/WPGMC linkage. See linkage for more
information on the return structure and algorithm.

-    :Parameters:
-        Q : ndarray
-            A condensed or redundant distance matrix. A condensed
-            distance matrix is a flat array containing the upper
-            triangular of the distance matrix. This is the form that
-            pdist returns. Alternatively, a collection of
-            m observation vectors in n dimensions may be passed as
-            a m by n array.
+    The following are common calling conventions:

-    Calling Conventions
-    -------------------
-
1. Z = median(y)

Performs median/WPGMC linkage on the condensed distance matrix
@@ -395,6 +384,19 @@
using Euclidean distance as the distance metric. See linkage

+    :Parameters:
+        Q : ndarray
+            A condensed or redundant distance matrix. A condensed
+            distance matrix is a flat array containing the upper
+            triangular of the distance matrix. This is the form that
+            pdist returns. Alternatively, a collection of
+            m observation vectors in n dimensions may be passed as
+            a m by n array.
+
+    :Returns:
+       - Z : ndarray
+           The hierarchical clustering encoded as a linkage matrix.
+
:SeeAlso:
"""
@@ -406,18 +408,8 @@
and algorithm.

-    :Parameters:
-        Q : ndarray
-            A condensed or redundant distance matrix. A condensed
-            distance matrix is a flat array containing the upper
-            triangular of the distance matrix. This is the form that
-            pdist returns. Alternatively, a collection of
-            m observation vectors in n dimensions may be passed as
-            a m by n array.
+    The following are common calling conventions:

-    Calling Conventions
-    -------------------
-
1. Z = ward(y)
Performs Ward's linkage on the condensed distance matrix Z. See
@@ -428,6 +420,19 @@
Euclidean distance as the distance metric. See linkage for more
information on the return structure and algorithm.

+    :Parameters:
+        Q : ndarray
+            A condensed or redundant distance matrix. A condensed
+            distance matrix is a flat array containing the upper
+            triangular of the distance matrix. This is the form that
+            pdist returns. Alternatively, a collection of
+            m observation vectors in n dimensions may be passed as
+            a m by n array.
+
+    :Returns:
+       - Z : ndarray
+           The hierarchical clustering encoded as a linkage matrix.
+
:SeeAlso:
"""
@@ -476,111 +481,109 @@
combined to form cluster :math:u. Let :math:v be any
remaining cluster in the forest that is not :math:u.

-    :Parameters:
-       Q : ndarray
-           A condensed or redundant distance matrix. A condensed
-           distance matrix is a flat array containing the upper
-           triangular of the distance matrix. This is the form that
-           pdist returns. Alternatively, a collection of
-           :math:m observation vectors in n dimensions may be passed as
-           a :math:m by :math:n array.
-       method : string
-           The linkage algorithm to use. See the Linkage Methods
-           section below for full descriptions.
-       metric : string
-           The distance metric to use. See the distance.pdist
-           function for a list of valid distance metrics.
-
-    ---------------
-
The following are methods for calculating the distance between the
newly formed cluster :math:u and each :math:v.

-    * method=single assigns
+     * method=single assigns

-      .. math:
-         d(u,v) = \min(dist(u[i],v[j]))
+       .. math::
+          d(u,v) = \min(dist(u[i],v[j]))

-      for all points :math:i in cluster :math:u and
-      :math:j in cluster :math:v. This is also known as the
-      Nearest Point Algorithm.
+       for all points :math:i in cluster :math:u and
+       :math:j in cluster :math:v. This is also known as the
+       Nearest Point Algorithm.

-    * method=complete assigns
+     * method=complete assigns

-      .. math:
-         d(u, v) = \max(dist(u[i],v[j]))
+       .. math::
+          d(u, v) = \max(dist(u[i],v[j]))

-      for all points :math:i in cluster u and :math:j in
-      cluster :math:v. This is also known by the Farthest Point
-      Algorithm or Voor Hees Algorithm.
+       for all points :math:i in cluster u and :math:j in
+       cluster :math:v. This is also known by the Farthest Point
+       Algorithm or Voor Hees Algorithm.

-    * method=average assigns
+     * method=average assigns

-      .. math:
-         d(u,v) = \sum_{ij} \frac{d(u[i], v[j])}
-                                 {(|u|*|v|)
+       .. math::
+          d(u,v) = \sum_{ij} \frac{d(u[i], v[j])}
+                                  {(|u|*|v|)

-      for all points :math:i and :math:j where :math:|u|
-      and :math:|v| are the cardinalities of clusters :math:u
-      and :math:v, respectively. This is also called the UPGMA
-      algorithm. This is called UPGMA.
+       for all points :math:i and :math:j where :math:|u|
+       and :math:|v| are the cardinalities of clusters :math:u
+       and :math:v, respectively. This is also called the UPGMA
+       algorithm. This is called UPGMA.

-    * method='weighted' assigns
+     * method='weighted' assigns

-      .. math:
-         d(u,v) = (dist(s,v) + dist(t,v))/2
+       .. math::
+          d(u,v) = (dist(s,v) + dist(t,v))/2

-      where cluster u was formed with cluster s and t and v
-      is a remaining cluster in the forest. (also called WPGMA)
+       where cluster u was formed with cluster s and t and v
+       is a remaining cluster in the forest. (also called WPGMA)

+     * method='centroid' assigns

-    * method='centroid' assigns
+       .. math::
+          dist(s,t) = euclid(c_s, c_t)

-      .. math:
-         dist(s,t) = euclid(c_s, c_t)
+       where :math:c_s and :math:c_t are the centroids of
+       clusters :math:s and :math:t, respectively. When two
+       clusters :math:s and :math:t are combined into a new
+       cluster :math:u, the new centroid is computed over all the
+       original objects in clusters :math:s and :math:t. The
+       distance then becomes the Euclidean distance between the
+       centroid of :math:u and the centroid of a remaining cluster
+       :math:v in the forest. This is also known as the UPGMC
+       algorithm.

-      where :math:c_s and :math:c_t are the centroids of
-      clusters :math:s and :math:t, respectively. When two
-      clusters :math:s and :math:t are combined into a new
-      cluster :math:u, the new centroid is computed over all the
-      original objects in clusters :math:s and :math:t. The
-      distance then becomes the Euclidean distance between the
-      centroid of :math:u and the centroid of a remaining cluster
-      :math:v in the forest. This is also known as the UPGMC
-      algorithm.
+     * method='median' assigns math:$d(s,t)$ like the centroid
+       method. When two clusters s and t are combined into a new
+       cluster :math:u, the average of centroids s and t give the
+       new centroid :math:u. This is also known as the WPGMC
+       algorithm.

-    * method='median' assigns math:$d(s,t)$ like the centroid
-      method. When two clusters s and t are combined into a new
-      cluster :math:u, the average of centroids s and t give the
-      new centroid :math:u. This is also known as the WPGMC
-      algorithm.
+     * method='ward' uses the Ward variance minimization algorithm.
+       The new entry :math:d(u,v) is computed as follows,

-    * method='ward' uses the Ward variance minimization algorithm.
-      The new entry :math:d(u,v) is computed as follows,
+       .. math::

-      .. math:
+          d(u,v) = \sqrt{\frac{|v|+|s|}
+                              {T}d(v,s)^2
+                       + \frac{|v|+|t|}
+                              {T}d(v,t)^2
+                       + \frac{|v|}
+                              {T}d(s,t)^2}

-         d(u,v) = \sqrt{\frac{|v|+|s|}
-                             {T}d(v,s)^2
-                      + \frac{|v|+|t|}
-                             {T}d(v,t)^2
-                      + \frac{|v|}
-                             {T}d(s,t)^2}
+       where :math:u is the newly joined cluster consisting of
+       clusters :math:s and :math:t, :math:v is an unused
+       cluster in the forest, :math:T=|v|+|s|+|t|, and
+       :math:|*| is the cardinality of its argument. This is also
+       known as the incremental algorithm.

-      where :math:u is the newly joined cluster consisting of
-      clusters :math:s and :math:t, :math:v is an unused
-      cluster in the forest, :math:T=|v|+|s|+|t|, and
-      :math:|*| is the cardinality of its argument. This is also
-      known as the incremental algorithm.
+    Warning: When the minimum distance pair in the forest is chosen, there may
+    be two or more pairs with the same minimum distance. This
+    implementation may chose a different minimum than the MATLAB(TM)
+    version.

-   Warning
-   -------
+    :Parameters:
+       - Q : ndarray
+           A condensed or redundant distance matrix. A condensed
+           distance matrix is a flat array containing the upper
+           triangular of the distance matrix. This is the form that
+           pdist returns. Alternatively, a collection of
+           :math:m observation vectors in n dimensions may be passed as
+           a :math:m by :math:n array.
+       - method : string
+           The linkage algorithm to use. See the Linkage Methods
+           section below for full descriptions.
+       - metric : string
+           The distance metric to use. See the distance.pdist
+           function for a list of valid distance metrics.

-   When the minimum distance pair in the forest is chosen, there may
-   be two or more pairs with the same minimum distance. This
-   implementation may chose a different minimum than the MATLAB(TM)
-   version.
+   :Returns:
+
+       - Z : ndarray
+           The hierarchical clustering encoded as a linkage matrix.
"""
if not isinstance(method, str):
raise TypeError("Argument 'method' must be a string.")
@@ -788,6 +791,10 @@
the ClusterNode object is a leaf node, its count must be 1, and its
distance is meaningless but set to 0.

+    Note: This function is provided for the convenience of the library
+    user. ClusterNodes are not used as input to any of the functions in this
+    library.
+
:Parameters:

- Z : ndarray
@@ -807,9 +814,6 @@
- L : list
The pre-order traversal.

-    Note: This function is provided for the convenience of the library
-    user. ClusterNodes are not used as input to any of the functions in this
-    library.
"""

Z = np.asarray(Z, order='c')
@@ -945,6 +949,9 @@
r"""
Calculates inconsistency statistics on a linkage.

+    Note: This function behaves similarly to the MATLAB(TM)
+    inconsistent function.
+
:Parameters:
- d : int
The number of links up to d levels below each
@@ -971,9 +978,6 @@

\frac{\mathtt{Z[i,2]}-\mathtt{R[i,0]}}
{R[i,1]}.
-
-    This function behaves similarly to the MATLAB(TM) inconsistent
-    function.
"""
Z = np.asarray(Z, order='c')

Modified: trunk/scipy/spatial/distance.py
===================================================================
--- trunk/scipy/spatial/distance.py	2008-11-22 19:09:35 UTC (rev 5169)
+++ trunk/scipy/spatial/distance.py	2008-11-22 20:18:12 UTC (rev 5170)
@@ -283,10 +283,10 @@
Computes the Cosine distance between two n-vectors u and v, which
is defined as

-      .. math::
+    .. math::

-         \frac{1-uv^T}
-              {||u||_2 ||v||_2}.
+       \frac{1-uv^T}
+            {||u||_2 ||v||_2}.

:Parameters:
u : ndarray
@@ -341,7 +341,7 @@
u and v. If u and v are boolean vectors, the Hamming
distance is

-    .. math:
+    .. math::

\frac{c_{01} + c_{10}}{n}

@@ -398,7 +398,7 @@
Computes the Kulsinski dissimilarity between two boolean n-vectors
u and v, which is defined as

-    .. math:
+    .. math::

\frac{c_{TF} + c_{FT} - c_{TT} + n}
{c_{FT} + c_{TF} + n}
@@ -453,7 +453,7 @@
Computes the Manhattan distance between two n-vectors u and v,
which is defined as

-    .. math:
+    .. math::

\sum_i {u_i-v_i}.

@@ -476,7 +476,8 @@
Computes the Mahalanobis distance between two n-vectors u and v,
which is defiend as

-    .. math:
+    .. math::
+
(u-v)V^{-1}(u-v)^T

where VI is the inverse covariance matrix :math:V^{-1}.
@@ -501,7 +502,8 @@
Computes the Chebyshev distance between two n-vectors u and v,
which is defined as

-    .. math:
+    .. math::
+
\max_i {|u_i-v_i|}.

:Parameters:
@@ -523,7 +525,7 @@
Computes the Bray-Curtis distance between two n-vectors u and
v, which is defined as

-    .. math:
+    .. math::

\sum{|u_i-v_i|} / \sum{|u_i+v_i|}.

@@ -546,7 +548,7 @@
Computes the Canberra distance between two n-vectors u and v,
which is defined as

-    .. math:
+    .. math::

\frac{\sum_i {|u_i-v_i|}}
{\sum_i {|u_i|+|v_i|}}.
@@ -610,7 +612,7 @@
which is defined as

-    .. math:
+    .. math::

\frac{R}
\frac{c_{TT} + c_{FF} + \frac{R}{2}}
@@ -639,7 +641,7 @@
Computes the Matching dissimilarity between two boolean n-vectors
u and v, which is defined as

-    .. math:
+    .. math::

\frac{c_{TF} + c_{FT}}{n}

@@ -667,7 +669,7 @@
Computes the Dice dissimilarity between two boolean n-vectors
u and v, which is

-    .. math:
+    .. math::

\frac{c_{TF} + c_{FT}}
{2c_{TT} + c_{FT} + c_{TF}}
@@ -700,7 +702,7 @@
Computes the Rogers-Tanimoto dissimilarity between two boolean
n-vectors u and v, which is defined as

-    .. math:
+    .. math::
\frac{R}
{c_{TT} + c_{FF} + R}

@@ -729,7 +731,7 @@
Computes the Russell-Rao dissimilarity between two boolean n-vectors
u and v, which is defined as

-    .. math:
+    .. math::

\frac{n - c_{TT}}
{n}
@@ -761,7 +763,7 @@
Computes the Sokal-Michener dissimilarity between two boolean vectors
u and v, which is defined as

-    .. math:
+    .. math::

\frac{2R}
{S + 2R}
@@ -797,7 +799,7 @@
Computes the Sokal-Sneath dissimilarity between two boolean vectors
u and v,

-    .. math:
+    .. math::

\frac{2R}
{c_{TT} + 2R}
@@ -838,33 +840,8 @@
this entry or to convert the condensed distance matrix to a
redundant square matrix.

-    :Parameters:
-       X : ndarray
-           An m by n array of m original observations in an
-           n-dimensional space.
-       metric : string or function
-           The distance metric to use. The distance function can
-           be 'braycurtis', 'canberra', 'chebyshev', 'cityblock',
-           'correlation', 'cosine', 'dice', 'euclidean', 'hamming',
-           'jaccard', 'kulsinski', 'mahalanobis', 'matching',
-           'minkowski', 'rogerstanimoto', 'russellrao', 'seuclidean',
-           'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule'.
-       w : ndarray
-           The weight vector (for weighted Minkowski).
-       p : double
-           The p-norm to apply (for Minkowski, weighted and unweighted)
-       V : ndarray
-           The variance vector (for standardized Euclidean).
-       VI : ndarray
-           The inverse of the covariance matrix (for Mahalanobis).
+    The following are common calling conventions.

-    :Returns:
-       Y : ndarray
-           A condensed distance matrix.
-
-    Calling Conventions
-    -------------------
-
1. Y = pdist(X, 'euclidean')

Computes the distance between m points using Euclidean distance
@@ -886,9 +863,9 @@
Computes the standardized Euclidean distance. The standardized
Euclidean distance between two n-vectors u and v is

-       .. math:
+       .. math::

-          sqrt(\sum {(u_i-v_i)^2 / V[x_i]}).
+          \sqrt{\sum {(u_i-v_i)^2 / V[x_i]}}.

V is the variance vector; V[i] is the variance computed over all
the i'th components of the points. If not passed, it is
@@ -903,7 +880,7 @@

Computes the cosine distance between vectors u and v,

-       .. math:
+       .. math::

\frac{1 - uv^T}
{{|u|}_2 {|v|}_2}
@@ -914,7 +891,7 @@

Computes the correlation distance between vectors u and v. This is

-       .. math:
+       .. math::

\frac{1 - (u - \bar{u})(v - \bar{v})^T}
{{|(u - \bar{u})|}{|(v - \bar{v})|}^T}
@@ -942,7 +919,7 @@
maximum norm-1 distance between their respective elements. More
precisely, the distance is given by

-       .. math:
+       .. math::

d(u,v) = max_i {|u_i-v_i|}.

@@ -951,7 +928,7 @@
Computes the Canberra distance between the points. The
Canberra distance between two points u and v is

-       .. math:
+       .. math::

d(u,v) = \sum_u {|u_i-v_i|}
{|u_i|+|v_i|}
@@ -963,7 +940,7 @@
Bray-Curtis distance between two points u and v is

-       .. math:
+       .. math::

d(u,v) = \frac{\sum_i {u_i-v_i}}
{\sum_i {u_i+v_i}}
@@ -1043,6 +1020,31 @@

dm = pdist(X, 'sokalsneath')

+    :Parameters:
+       X : ndarray
+           An m by n array of m original observations in an
+           n-dimensional space.
+       metric : string or function
+           The distance metric to use. The distance function can
+           be 'braycurtis', 'canberra', 'chebyshev', 'cityblock',
+           'correlation', 'cosine', 'dice', 'euclidean', 'hamming',
+           'jaccard', 'kulsinski', 'mahalanobis', 'matching',
+           'minkowski', 'rogerstanimoto', 'russellrao', 'seuclidean',
+           'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule'.
+       w : ndarray
+           The weight vector (for weighted Minkowski).
+       p : double
+           The p-norm to apply (for Minkowski, weighted and unweighted)
+       V : ndarray
+           The variance vector (for standardized Euclidean).
+       VI : ndarray
+           The inverse of the covariance matrix (for Mahalanobis).
+
+    :Returns:
+       Y : ndarray
+           A condensed distance matrix.
+
+
"""

@@ -1592,9 +1594,9 @@
Computes the standardized Euclidean distance. The standardized
Euclidean distance between two n-vectors u and v is

-       .. math:
+       .. math::

-          sqrt(\sum {(u_i-v_i)^2 / V[x_i]}).
+          \sqrt{\sum {(u_i-v_i)^2 / V[x_i]}}.

V is the variance vector; V[i] is the variance computed over all
the i'th components of the points. If not passed, it is
@@ -1609,7 +1611,7 @@

Computes the cosine distance between vectors u and v,

-       .. math:
+       .. math::

\frac{1 - uv^T}
{{|u|}_2 {|v|}_2}
@@ -1620,7 +1622,7 @@

Computes the correlation distance between vectors u and v. This is

-       .. math:
+       .. math::

\frac{1 - (u - n{|u|}_1){(v - n{|v|}_1)}^T}
{{|(u - n{|u|}_1)|}_2 {|(v - n{|v|}_1)|}^T}
@@ -1650,7 +1652,7 @@
maximum norm-1 distance between their respective elements. More
precisely, the distance is given by

-       .. math:
+       .. math::

d(u,v) = max_i {|u_i-v_i|}.

@@ -1659,7 +1661,7 @@
Computes the Canberra distance between the points. The
Canberra distance between two points u and v is

-       .. math:
+       .. math::

d(u,v) = \sum_u {|u_i-v_i|}
{|u_i|+|v_i|}
@@ -1671,7 +1673,7 @@
Bray-Curtis distance between two points u and v is

-       .. math:
+       .. math::

d(u,v) = \frac{\sum_i {u_i-v_i}}
{\sum_i {u_i+v_i}}