[SciPy-user] 2D clustering question

Hazen Babcock hbabcock@mac....
Mon May 4 18:06:07 CDT 2009


I've been using scipy.cluster.hierarchy.fclusterdata() to cluster groups 
of points based on their x and y position. This works well for data sets 
without out too many points, but seems to get pretty slow as the number 
of points gets into the high thousands (i.e. 6000+). Does anyone know of 
a more specialized clustering algorithm that might be able to handle 
even larger numbers of points, i.e. up to 10e6 or so? The points are 
spread out over 0 - 200 or so in X and Y and I'm clustering with a 0.5 
cutoff. One approach is to break the data set down into smaller sections 
based on X,Y coordinate, but perhaps something like this already exists?


More information about the SciPy-user mailing list