[Scipy-tickets] [SciPy] #612: New, "better" cluster package

SciPy scipy-tickets@scipy....
Mon Feb 25 18:41:07 CST 2008


#612: New, "better" cluster package
---------------------------+------------------------------------------------
 Reporter:  rspringuel     |       Owner:  somebody
     Type:  enhancement    |      Status:  new     
 Priority:  normal         |   Milestone:  0.7     
Component:  scipy.cluster  |     Version:          
 Severity:  normal         |    Keywords:          
---------------------------+------------------------------------------------
 The current cluster package for scipy implements only k-means clustering
 for a euclidean distance metric with centroids computed as the mean of the
 members of each cluster.

 Pycluster implements 8 different distance metrics, three different
 centroid methods (mean, median, and medoid) and four different clustering
 algorithms (k-means, agglomerative hierarchical, SOM, and PCA) but
 contains non-BSD style license compatible elements and so cannot be
 incorporated into scipy.

 Using Pycluster as a model, I have written a clustering package from
 scratch that duplicates most of its functionality and expands on it in
 certain areas as well.

 I have also endevored to design the package to make future expansions easy
 and straight forward.

 To date what I have written supports the following:

 Distances:
 euclidean
 normalized euclidean
 city block (aka manhattan)
 normalized city block
 hamming (aka simple mapping coefficient)
 pearson
 absolute pearson
 uncentered pearson
 arccosine of pearson
 absolute uncentered pearson
 spearman
 kendall
 modified simple matching coefficent of Rogers and Tanimoto
 modified simple matching coefficent of Sokal and Sneath
 jaccard coefficent
 modified jaccard coefficent of Dice
 modified jaccard coefficent of Sokal and Sneath
 general Minkowski metric
 Chebychev distance

 Centroids:
 arithmetic mean
 median
 absolute mean
 geometric mean
 harmonic mean
 quadratic mean
 mediod (using any of the above distances)

 Clustering algorithms:
 k-means
 c-means (fuzzy clustering)
 agglomerative hierarchical


 Additional distances, centroid methods, and clustering algorithms may be
 added as my work requires them.

 Since I have written all of this code from scratch, I control the license
 to it and have elected to release it under a BSD-style license so that it
 can be incorporated into scipy.

 What I have written obviously greatly expands on the functionality of the
 current cluster package, but it is written entirely in python as so may be
 slower than what is currently present where the functionality overlaps
 (hence the quotation marks around "better" in the title of this ticket).

 Note: All of the centroid methods are based on my statistical functions
 submitted in ticket #604 and so the code would have to be revised should
 those stats functions not be incorporated into scipy.

-- 
Ticket URL: <http://scipy.org/scipy/scipy/ticket/612>
SciPy <http://www.scipy.org/>
SciPy is open-source software for mathematics, science, and engineering.


More information about the Scipy-tickets mailing list