[SciPy-user] "clustergrams"/hierarchical clustering heat maps
Wed Feb 18 20:06:54 CST 2009
On 15-Feb-09, at 1:18 AM, Damian Eads wrote:
> Hi David,
> Sorry. I did not see your message until now. Several people have
> already inquired about heatmaps. I've been meaning to eventually
> implement support for them but since I don't work with microarray data
> and I'm in the midst of trying to get a paper out, it has fallen onto
> the back burner.
Not a problem, I know how it is.
> As a first step, I'd need to implement support for
> missing attributes since this seems to be common with microarray data.
It can be, though as far as I know, a common strategy with microarrays
is to just impute missing values in one way or another.
> As far as I know, a heatmap illustrates clustering along two axes:
> observation vectors and attributes. For example, suppose we're
> clustering patients by their genes. There is one observation vector
> for each patient, and one vector element per gene. Clustering
> observation vectors is the typical case, which is used to identify
> groups of similar patients. Clustering attributes (across observation
> vectors) is less typical but would be used to identifying groups of
> similar genes.
> The heatmap just illustrates the vectors, the color is the intensity.
> When clustering along a single dimension (observation vectors), no
> sorting is necessary, and a dendrogram is drawn along the vertical
> axis. The i'th row is just the observation vector corresponding to the
> i'th leaf node. No sorting along the attribute dimension is needed.
> Along two dimensions, there is a dendrogram along the horizontal axis.
> Now the attributes must be reordered so that the j'th column
> corresponds to the j'th leaf node.
> This is my first time describing heat maps so I apologize if this
> description is terse. Does it make some sense?
That corresponds with my understanding as well. Though I'm not certain
that 'no sorting is needed' if we're just clustering along one
dimension. Is what you mean is that the order is completely specified
by the dendrogram? Because that would make sense.
As far as I know there is also some heuristic for laying out both axes
(since there are arbitrary ordering choices to be made, e.g. which
branch to put on the left and which on the right) which makes them
easier to see patterns in, my advisor name-dropped the name of the
algorithm once but I'd have to ask him again.
> As far as how someone implements this, it seems like it'd be pretty
> simple. There is a helper function called _plot_dendrogram that takes
> in a collection of raw dendrogram lines to be rendered on the plot.
> First, plot the heatmap (sorting the attributes so that the columns
> correspond to the ids of the leaf nodes); this can be done with
> imshow. Second, for the first dendrogram, call _plot_dendrogram but
> provide it with a shifting parameters so that the dendrogram lines are
> rendered to the left of the image. Third, call _plot_dendrogram again,
> provide a shifting parameter, but instead shift the lines downward for
> the attribute clustering dendrogram.
Sounds as though the "completely specified" bit above is what you
meant. And it sounds as though the existing interface should be
sufficient to get something going.
> I want to get to this soon but no promises. Sorry.
If I don't beat you to it. :)
More information about the SciPy-user