[Scipysvn] r5170  in trunk/scipy: cluster spatial
scipysvn@scip...
scipysvn@scip...
Sat Nov 22 14:18:16 CST 2008
Author: damian.eads
Date: 20081122 14:18:12 0600 (Sat, 22 Nov 2008)
New Revision: 5170
Modified:
trunk/scipy/cluster/hierarchy.py
trunk/scipy/spatial/distance.py
Log:
Wordsmithing some hierarchy and spatial documentation.
Modified: trunk/scipy/cluster/hierarchy.py
===================================================================
 trunk/scipy/cluster/hierarchy.py 20081122 19:09:35 UTC (rev 5169)
+++ trunk/scipy/cluster/hierarchy.py 20081122 20:18:12 UTC (rev 5170)
@@ 331,6 +331,20 @@
Performs centroid/UPGMC linkage. See ``linkage`` for more
information on the return structure and algorithm.
+ The following are common calling conventions:
+
+ 1. Z = centroid(y)
+
+ Performs centroid/UPGMC linkage on the condensed distance
+ matrix ``y``. See ``linkage`` for more information on the return
+ structure and algorithm.
+
+ 2. Z = centroid(X)
+
+ Performs centroid/UPGMC linkage on the observation matrix ``X``
+ using Euclidean distance as the distance metric. See ``linkage``
+ for more information on the return structure and algorithm.
+
:Parameters:
Q : ndarray
A condensed or redundant distance matrix. A condensed
@@ 346,21 +360,6 @@
the ``linkage`` function documentation for more information
on its structure.
 Calling Conventions
 

 1. Z = centroid(y)

 Performs centroid/UPGMC linkage on the condensed distance
 matrix ``y``. See ``linkage`` for more information on the return
 structure and algorithm.

 2. Z = centroid(X)

 Performs centroid/UPGMC linkage on the observation matrix ``X``
 using Euclidean distance as the distance metric. See ``linkage``
 for more information on the return structure and algorithm.

:SeeAlso:
 linkage: for advanced creation of hierarchical clusterings.
"""
@@ 371,18 +370,8 @@
Performs median/WPGMC linkage. See ``linkage`` for more
information on the return structure and algorithm.
 :Parameters:
 Q : ndarray
 A condensed or redundant distance matrix. A condensed
 distance matrix is a flat array containing the upper
 triangular of the distance matrix. This is the form that
 ``pdist`` returns. Alternatively, a collection of
 m observation vectors in n dimensions may be passed as
 a m by n array.
+ The following are common calling conventions:
 Calling Conventions
 

1. Z = median(y)
Performs median/WPGMC linkage on the condensed distance matrix
@@ 395,6 +384,19 @@
using Euclidean distance as the distance metric. See linkage
for more information on the return structure and algorithm.
+ :Parameters:
+ Q : ndarray
+ A condensed or redundant distance matrix. A condensed
+ distance matrix is a flat array containing the upper
+ triangular of the distance matrix. This is the form that
+ ``pdist`` returns. Alternatively, a collection of
+ m observation vectors in n dimensions may be passed as
+ a m by n array.
+
+ :Returns:
+  Z : ndarray
+ The hierarchical clustering encoded as a linkage matrix.
+
:SeeAlso:
 linkage: for advanced creation of hierarchical clusterings.
"""
@@ 406,18 +408,8 @@
matrix. See linkage for more information on the return structure
and algorithm.
 :Parameters:
 Q : ndarray
 A condensed or redundant distance matrix. A condensed
 distance matrix is a flat array containing the upper
 triangular of the distance matrix. This is the form that
 ``pdist`` returns. Alternatively, a collection of
 m observation vectors in n dimensions may be passed as
 a m by n array.
+ The following are common calling conventions:
 Calling Conventions
 

1. Z = ward(y)
Performs Ward's linkage on the condensed distance matrix Z. See
linkage for more information on the return structure and
@@ 428,6 +420,19 @@
Euclidean distance as the distance metric. See linkage for more
information on the return structure and algorithm.
+ :Parameters:
+ Q : ndarray
+ A condensed or redundant distance matrix. A condensed
+ distance matrix is a flat array containing the upper
+ triangular of the distance matrix. This is the form that
+ ``pdist`` returns. Alternatively, a collection of
+ m observation vectors in n dimensions may be passed as
+ a m by n array.
+
+ :Returns:
+  Z : ndarray
+ The hierarchical clustering encoded as a linkage matrix.
+
:SeeAlso:
 linkage: for advanced creation of hierarchical clusterings.
"""
@@ 476,111 +481,109 @@
combined to form cluster :math:`u`. Let :math:`v` be any
remaining cluster in the forest that is not :math:`u`.
 :Parameters:
 Q : ndarray
 A condensed or redundant distance matrix. A condensed
 distance matrix is a flat array containing the upper
 triangular of the distance matrix. This is the form that
 ``pdist`` returns. Alternatively, a collection of
 :math:`m` observation vectors in n dimensions may be passed as
 a :math:`m` by :math:`n` array.
 method : string
 The linkage algorithm to use. See the ``Linkage Methods``
 section below for full descriptions.
 metric : string
 The distance metric to use. See the ``distance.pdist``
 function for a list of valid distance metrics.

 Linkage Methods
 

The following are methods for calculating the distance between the
newly formed cluster :math:`u` and each :math:`v`.
 * method=``single`` assigns
+ * method=``single`` assigns
 .. math:
 d(u,v) = \min(dist(u[i],v[j]))
+ .. math::
+ d(u,v) = \min(dist(u[i],v[j]))
 for all points :math:`i` in cluster :math:`u` and
 :math:`j` in cluster :math:`v`. This is also known as the
 Nearest Point Algorithm.
+ for all points :math:`i` in cluster :math:`u` and
+ :math:`j` in cluster :math:`v`. This is also known as the
+ Nearest Point Algorithm.
 * method=``complete`` assigns
+ * method=``complete`` assigns
 .. math:
 d(u, v) = \max(dist(u[i],v[j]))
+ .. math::
+ d(u, v) = \max(dist(u[i],v[j]))
 for all points :math:`i` in cluster u and :math:`j` in
 cluster :math:`v`. This is also known by the Farthest Point
 Algorithm or Voor Hees Algorithm.
+ for all points :math:`i` in cluster u and :math:`j` in
+ cluster :math:`v`. This is also known by the Farthest Point
+ Algorithm or Voor Hees Algorithm.
 * method=``average`` assigns
+ * method=``average`` assigns
 .. math:
 d(u,v) = \sum_{ij} \frac{d(u[i], v[j])}
 {(u*v)
+ .. math::
+ d(u,v) = \sum_{ij} \frac{d(u[i], v[j])}
+ {(u*v)
 for all points :math:`i` and :math:`j` where :math:`u`
 and :math:`v` are the cardinalities of clusters :math:`u`
 and :math:`v`, respectively. This is also called the UPGMA
 algorithm. This is called UPGMA.
+ for all points :math:`i` and :math:`j` where :math:`u`
+ and :math:`v` are the cardinalities of clusters :math:`u`
+ and :math:`v`, respectively. This is also called the UPGMA
+ algorithm. This is called UPGMA.
 * method='weighted' assigns
+ * method='weighted' assigns
 .. math:
 d(u,v) = (dist(s,v) + dist(t,v))/2
+ .. math::
+ d(u,v) = (dist(s,v) + dist(t,v))/2
 where cluster u was formed with cluster s and t and v
 is a remaining cluster in the forest. (also called WPGMA)
+ where cluster u was formed with cluster s and t and v
+ is a remaining cluster in the forest. (also called WPGMA)
+ * method='centroid' assigns
 * method='centroid' assigns
+ .. math::
+ dist(s,t) = euclid(c_s, c_t)
 .. math:
 dist(s,t) = euclid(c_s, c_t)
+ where :math:`c_s` and :math:`c_t` are the centroids of
+ clusters :math:`s` and :math:`t`, respectively. When two
+ clusters :math:`s` and :math:`t` are combined into a new
+ cluster :math:`u`, the new centroid is computed over all the
+ original objects in clusters :math:`s` and :math:`t`. The
+ distance then becomes the Euclidean distance between the
+ centroid of :math:`u` and the centroid of a remaining cluster
+ :math:`v` in the forest. This is also known as the UPGMC
+ algorithm.
 where :math:`c_s` and :math:`c_t` are the centroids of
 clusters :math:`s` and :math:`t`, respectively. When two
 clusters :math:`s` and :math:`t` are combined into a new
 cluster :math:`u`, the new centroid is computed over all the
 original objects in clusters :math:`s` and :math:`t`. The
 distance then becomes the Euclidean distance between the
 centroid of :math:`u` and the centroid of a remaining cluster
 :math:`v` in the forest. This is also known as the UPGMC
 algorithm.
+ * method='median' assigns math:`$d(s,t)$` like the ``centroid``
+ method. When two clusters s and t are combined into a new
+ cluster :math:`u`, the average of centroids s and t give the
+ new centroid :math:`u`. This is also known as the WPGMC
+ algorithm.
 * method='median' assigns math:`$d(s,t)$` like the ``centroid``
 method. When two clusters s and t are combined into a new
 cluster :math:`u`, the average of centroids s and t give the
 new centroid :math:`u`. This is also known as the WPGMC
 algorithm.
+ * method='ward' uses the Ward variance minimization algorithm.
+ The new entry :math:`d(u,v)` is computed as follows,
 * method='ward' uses the Ward variance minimization algorithm.
 The new entry :math:`d(u,v)` is computed as follows,
+ .. math::
 .. math:
+ d(u,v) = \sqrt{\frac{v+s}
+ {T}d(v,s)^2
+ + \frac{v+t}
+ {T}d(v,t)^2
+ + \frac{v}
+ {T}d(s,t)^2}
 d(u,v) = \sqrt{\frac{v+s}
 {T}d(v,s)^2
 + \frac{v+t}
 {T}d(v,t)^2
 + \frac{v}
 {T}d(s,t)^2}
+ where :math:`u` is the newly joined cluster consisting of
+ clusters :math:`s` and :math:`t`, :math:`v` is an unused
+ cluster in the forest, :math:`T=v+s+t`, and
+ :math:`*` is the cardinality of its argument. This is also
+ known as the incremental algorithm.
 where :math:`u` is the newly joined cluster consisting of
 clusters :math:`s` and :math:`t`, :math:`v` is an unused
 cluster in the forest, :math:`T=v+s+t`, and
 :math:`*` is the cardinality of its argument. This is also
 known as the incremental algorithm.
+ Warning: When the minimum distance pair in the forest is chosen, there may
+ be two or more pairs with the same minimum distance. This
+ implementation may chose a different minimum than the MATLAB(TM)
+ version.
 Warning
 
+ :Parameters:
+  Q : ndarray
+ A condensed or redundant distance matrix. A condensed
+ distance matrix is a flat array containing the upper
+ triangular of the distance matrix. This is the form that
+ ``pdist`` returns. Alternatively, a collection of
+ :math:`m` observation vectors in n dimensions may be passed as
+ a :math:`m` by :math:`n` array.
+  method : string
+ The linkage algorithm to use. See the ``Linkage Methods``
+ section below for full descriptions.
+  metric : string
+ The distance metric to use. See the ``distance.pdist``
+ function for a list of valid distance metrics.
 When the minimum distance pair in the forest is chosen, there may
 be two or more pairs with the same minimum distance. This
 implementation may chose a different minimum than the MATLAB(TM)
 version.
+ :Returns:
+
+  Z : ndarray
+ The hierarchical clustering encoded as a linkage matrix.
"""
if not isinstance(method, str):
raise TypeError("Argument 'method' must be a string.")
@@ 788,6 +791,10 @@
the ClusterNode object is a leaf node, its count must be 1, and its
distance is meaningless but set to 0.
+ Note: This function is provided for the convenience of the library
+ user. ClusterNodes are not used as input to any of the functions in this
+ library.
+
:Parameters:
 Z : ndarray
@@ 807,9 +814,6 @@
 L : list
The preorder traversal.
 Note: This function is provided for the convenience of the library
 user. ClusterNodes are not used as input to any of the functions in this
 library.
"""
Z = np.asarray(Z, order='c')
@@ 945,6 +949,9 @@
r"""
Calculates inconsistency statistics on a linkage.
+ Note: This function behaves similarly to the MATLAB(TM)
+ inconsistent function.
+
:Parameters:
 d : int
The number of links up to ``d`` levels below each
@@ 971,9 +978,6 @@
\frac{\mathtt{Z[i,2]}\mathtt{R[i,0]}}
{R[i,1]}.

 This function behaves similarly to the MATLAB(TM) inconsistent
 function.
"""
Z = np.asarray(Z, order='c')
Modified: trunk/scipy/spatial/distance.py
===================================================================
 trunk/scipy/spatial/distance.py 20081122 19:09:35 UTC (rev 5169)
+++ trunk/scipy/spatial/distance.py 20081122 20:18:12 UTC (rev 5170)
@@ 283,10 +283,10 @@
Computes the Cosine distance between two nvectors u and v, which
is defined as
 .. math::
+ .. math::
 \frac{1uv^T}
 {u_2 v_2}.
+ \frac{1uv^T}
+ {u_2 v_2}.
:Parameters:
u : ndarray
@@ 341,7 +341,7 @@
``u`` and ``v``. If ``u`` and ``v`` are boolean vectors, the Hamming
distance is
 .. math:
+ .. math::
\frac{c_{01} + c_{10}}{n}
@@ 398,7 +398,7 @@
Computes the Kulsinski dissimilarity between two boolean nvectors
u and v, which is defined as
 .. math:
+ .. math::
\frac{c_{TF} + c_{FT}  c_{TT} + n}
{c_{FT} + c_{TF} + n}
@@ 453,7 +453,7 @@
Computes the Manhattan distance between two nvectors u and v,
which is defined as
 .. math:
+ .. math::
\sum_i {u_iv_i}.
@@ 476,7 +476,8 @@
Computes the Mahalanobis distance between two nvectors ``u`` and ``v``,
which is defiend as
 .. math:
+ .. math::
+
(uv)V^{1}(uv)^T
where ``VI`` is the inverse covariance matrix :math:`V^{1}`.
@@ 501,7 +502,8 @@
Computes the Chebyshev distance between two nvectors u and v,
which is defined as
 .. math:
+ .. math::
+
\max_i {u_iv_i}.
:Parameters:
@@ 523,7 +525,7 @@
Computes the BrayCurtis distance between two nvectors ``u`` and
``v``, which is defined as
 .. math:
+ .. math::
\sum{u_iv_i} / \sum{u_i+v_i}.
@@ 546,7 +548,7 @@
Computes the Canberra distance between two nvectors u and v,
which is defined as
 .. math:
+ .. math::
\frac{\sum_i {u_iv_i}}
{\sum_i {u_i+v_i}}.
@@ 610,7 +612,7 @@
which is defined as
 .. math:
+ .. math::
\frac{R}
\frac{c_{TT} + c_{FF} + \frac{R}{2}}
@@ 639,7 +641,7 @@
Computes the Matching dissimilarity between two boolean nvectors
u and v, which is defined as
 .. math:
+ .. math::
\frac{c_{TF} + c_{FT}}{n}
@@ 667,7 +669,7 @@
Computes the Dice dissimilarity between two boolean nvectors
``u`` and ``v``, which is
 .. math:
+ .. math::
\frac{c_{TF} + c_{FT}}
{2c_{TT} + c_{FT} + c_{TF}}
@@ 700,7 +702,7 @@
Computes the RogersTanimoto dissimilarity between two boolean
nvectors ``u`` and ``v``, which is defined as
 .. math:
+ .. math::
\frac{R}
{c_{TT} + c_{FF} + R}
@@ 729,7 +731,7 @@
Computes the RussellRao dissimilarity between two boolean nvectors
``u`` and ``v``, which is defined as
 .. math:
+ .. math::
\frac{n  c_{TT}}
{n}
@@ 761,7 +763,7 @@
Computes the SokalMichener dissimilarity between two boolean vectors
``u`` and ``v``, which is defined as
 .. math:
+ .. math::
\frac{2R}
{S + 2R}
@@ 797,7 +799,7 @@
Computes the SokalSneath dissimilarity between two boolean vectors
``u`` and ``v``,
 .. math:
+ .. math::
\frac{2R}
{c_{TT} + 2R}
@@ 838,33 +840,8 @@
this entry or to convert the condensed distance matrix to a
redundant square matrix.
 :Parameters:
 X : ndarray
 An m by n array of m original observations in an
 ndimensional space.
 metric : string or function
 The distance metric to use. The distance function can
 be 'braycurtis', 'canberra', 'chebyshev', 'cityblock',
 'correlation', 'cosine', 'dice', 'euclidean', 'hamming',
 'jaccard', 'kulsinski', 'mahalanobis', 'matching',
 'minkowski', 'rogerstanimoto', 'russellrao', 'seuclidean',
 'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule'.
 w : ndarray
 The weight vector (for weighted Minkowski).
 p : double
 The pnorm to apply (for Minkowski, weighted and unweighted)
 V : ndarray
 The variance vector (for standardized Euclidean).
 VI : ndarray
 The inverse of the covariance matrix (for Mahalanobis).
+ The following are common calling conventions.
 :Returns:
 Y : ndarray
 A condensed distance matrix.

 Calling Conventions
 

1. ``Y = pdist(X, 'euclidean')``
Computes the distance between m points using Euclidean distance
@@ 886,9 +863,9 @@
Computes the standardized Euclidean distance. The standardized
Euclidean distance between two nvectors ``u`` and ``v`` is
 .. math:
+ .. math::
 sqrt(\sum {(u_iv_i)^2 / V[x_i]}).
+ \sqrt{\sum {(u_iv_i)^2 / V[x_i]}}.
V is the variance vector; V[i] is the variance computed over all
the i'th components of the points. If not passed, it is
@@ 903,7 +880,7 @@
Computes the cosine distance between vectors u and v,
 .. math:
+ .. math::
\frac{1  uv^T}
{{u}_2 {v}_2}
@@ 914,7 +891,7 @@
Computes the correlation distance between vectors u and v. This is
 .. math:
+ .. math::
\frac{1  (u  \bar{u})(v  \bar{v})^T}
{{(u  \bar{u})}{(v  \bar{v})}^T}
@@ 942,7 +919,7 @@
maximum norm1 distance between their respective elements. More
precisely, the distance is given by
 .. math:
+ .. math::
d(u,v) = max_i {u_iv_i}.
@@ 951,7 +928,7 @@
Computes the Canberra distance between the points. The
Canberra distance between two points ``u`` and ``v`` is
 .. math:
+ .. math::
d(u,v) = \sum_u {u_iv_i}
{u_i+v_i}
@@ 963,7 +940,7 @@
BrayCurtis distance between two points ``u`` and ``v`` is
 .. math:
+ .. math::
d(u,v) = \frac{\sum_i {u_iv_i}}
{\sum_i {u_i+v_i}}
@@ 1043,6 +1020,31 @@
dm = pdist(X, 'sokalsneath')
+ :Parameters:
+ X : ndarray
+ An m by n array of m original observations in an
+ ndimensional space.
+ metric : string or function
+ The distance metric to use. The distance function can
+ be 'braycurtis', 'canberra', 'chebyshev', 'cityblock',
+ 'correlation', 'cosine', 'dice', 'euclidean', 'hamming',
+ 'jaccard', 'kulsinski', 'mahalanobis', 'matching',
+ 'minkowski', 'rogerstanimoto', 'russellrao', 'seuclidean',
+ 'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule'.
+ w : ndarray
+ The weight vector (for weighted Minkowski).
+ p : double
+ The pnorm to apply (for Minkowski, weighted and unweighted)
+ V : ndarray
+ The variance vector (for standardized Euclidean).
+ VI : ndarray
+ The inverse of the covariance matrix (for Mahalanobis).
+
+ :Returns:
+ Y : ndarray
+ A condensed distance matrix.
+
+
"""
@@ 1592,9 +1594,9 @@
Computes the standardized Euclidean distance. The standardized
Euclidean distance between two nvectors ``u`` and ``v`` is
 .. math:
+ .. math::
 sqrt(\sum {(u_iv_i)^2 / V[x_i]}).
+ \sqrt{\sum {(u_iv_i)^2 / V[x_i]}}.
V is the variance vector; V[i] is the variance computed over all
the i'th components of the points. If not passed, it is
@@ 1609,7 +1611,7 @@
Computes the cosine distance between vectors u and v,
 .. math:
+ .. math::
\frac{1  uv^T}
{{u}_2 {v}_2}
@@ 1620,7 +1622,7 @@
Computes the correlation distance between vectors u and v. This is
 .. math:
+ .. math::
\frac{1  (u  n{u}_1){(v  n{v}_1)}^T}
{{(u  n{u}_1)}_2 {(v  n{v}_1)}^T}
@@ 1650,7 +1652,7 @@
maximum norm1 distance between their respective elements. More
precisely, the distance is given by
 .. math:
+ .. math::
d(u,v) = max_i {u_iv_i}.
@@ 1659,7 +1661,7 @@
Computes the Canberra distance between the points. The
Canberra distance between two points ``u`` and ``v`` is
 .. math:
+ .. math::
d(u,v) = \sum_u {u_iv_i}
{u_i+v_i}
@@ 1671,7 +1673,7 @@
BrayCurtis distance between two points ``u`` and ``v`` is
 .. math:
+ .. math::
d(u,v) = \frac{\sum_i {u_iv_i}}
{\sum_i {u_i+v_i}}
More information about the Scipysvn
mailing list