[SciPy-user] Looking for a way to cluster data

Gary Ruben gruben@bigpond.net...
Fri May 15 21:10:46 CDT 2009


Hi Damian,

Thanks for taking the time to reply. I ended up with a solution for now 
that doesn't use scipy.cluster and I won't have the time to revisit 
this, but I think that with the information you provided, I could 
probably have used the dendrogram function and not taken a graph-theory 
approach.

Gary

Damian Eads wrote:
> Hi Gary,
> 
> On Sat, Apr 25, 2009 at 8:18 PM, Gary Ruben <gruben@bigpond.net.au> wrote:
>> Hi all,
>>
>> I'm looking for some advice on how to order data points so that I can
>> visualise them. I've been looking at scipy.cluster for this purpose but
>> I'm not sure whether it is suitable so I thought I'd see whether anyone
>> had suggestions for a simpler suggestion of how to order the coordinates.
> 
> With the dendrogram function, the order nodes appear from
> left-to-right can be change with the distance_sort or count_sort
> functions.
> 
>> I have a binary 3D array containing 1's that form a shape in a 3D volume
>> against a background of 0's - they form a skeleton of a connected,
>> branched structure. Furthermore, the points are all 26-connected to each
>> other, i.e. there are no gaps in the skeleton. The longest chains may be
>> 1000's of points long.
>> It would be nice to visualise these using the mayavi mlab plot3d
>> function, which draws tubes and which requires ordered coordinates as
>> input, so I need to get ordered coordinate lists that traverse the
>> points along the branches of the skeleton. It would also be nice to
>> preferentially cluster long chains since then I can cull very short
>> chains from the visualisation.
>>
>> scipy.cluster seems to be able to cluster the points but I'm not sure
>> how to get the x,y,z coordinates of the original points out of its
>> linkage data. This may not be possible.
> 
> The rows of the linkage matrix are the clusters and the first two
> columns of the linkage matrix are the indices of the left and right
> node, respectively. If the index is less than the number of points
> clustered (i < N), it's a leaf node (original point/singleton
> cluster), otherwise it's a non-singleton cluster (i >= N). Note, that
> there are always (N-1) non-singleton clusters, so the linkage matrix
> will always have N-1 rows.
> 
> 
>> Maybe the scipy.spatial module
>> is a better match to my problem.
> 
> I haven't had the chance to read this part of the discussion but I
> hope my answer to your question helps.
> 
> Cheers,
> 
> Damian


More information about the SciPy-user mailing list