[SciPy-user] "clustergrams"/hierarchical clustering heat maps
Sun Feb 15 00:18:53 CST 2009
Sorry. I did not see your message until now. Several people have
already inquired about heatmaps. I've been meaning to eventually
implement support for them but since I don't work with microarray data
and I'm in the midst of trying to get a paper out, it has fallen onto
the back burner. As a first step, I'd need to implement support for
missing attributes since this seems to be common with microarray data.
As far as I know, a heatmap illustrates clustering along two axes:
observation vectors and attributes. For example, suppose we're
clustering patients by their genes. There is one observation vector
for each patient, and one vector element per gene. Clustering
observation vectors is the typical case, which is used to identify
groups of similar patients. Clustering attributes (across observation
vectors) is less typical but would be used to identifying groups of
The heatmap just illustrates the vectors, the color is the intensity.
When clustering along a single dimension (observation vectors), no
sorting is necessary, and a dendrogram is drawn along the vertical
axis. The i'th row is just the observation vector corresponding to the
i'th leaf node. No sorting along the attribute dimension is needed.
Along two dimensions, there is a dendrogram along the horizontal axis.
Now the attributes must be reordered so that the j'th column
corresponds to the j'th leaf node.
This is my first time describing heat maps so I apologize if this
description is terse. Does it make some sense?
As far as how someone implements this, it seems like it'd be pretty
simple. There is a helper function called _plot_dendrogram that takes
in a collection of raw dendrogram lines to be rendered on the plot.
First, plot the heatmap (sorting the attributes so that the columns
correspond to the ids of the leaf nodes); this can be done with
imshow. Second, for the first dendrogram, call _plot_dendrogram but
provide it with a shifting parameters so that the dendrogram lines are
rendered to the left of the image. Third, call _plot_dendrogram again,
provide a shifting parameter, but instead shift the lines downward for
the attribute clustering dendrogram.
I want to get to this soon but no promises. Sorry.
On Mon, Feb 2, 2009 at 11:12 PM, David Warde-Farley <email@example.com> wrote:
> Hi all,
> I was recently asked to cluster some data and I know from experience
> that people use these heat maps to look for patterns in multivariate
> data, often with a dendrogram off to the side. This involves sorting
> the rows and columns in a certain fashion, the details of which are
> somewhat fuzzy to me (and, truthfully, I'm happy with it staying that
> way for now).
> I notice that dendrogram plotting is available in
> scipy.cluster.hierarchy, and was wondering if the something for
> producing the associated sorted heat maps is available anywhere
> (within SciPy or otherwise).
> Many thanks,
> SciPy-user mailing list
Damian Eads Ph.D. Student
Jack Baskin School of Engineering, UCSC E2-489
1156 High Street Machine Learning Lab
Santa Cruz, CA 95064 http://www.soe.ucsc.edu/~eads
More information about the SciPy-user