[SciPy-user] Scipy for Cluster Detection

Lorenzo Isella lorenzo.isella@gmail....
Thu Dec 27 10:44:45 CST 2007

Many thanks for the useful answers.
I should have added that I have in mind to deal with system with at most 
a few thousands of particles, each of them labeled by 3 coordinates.
The particle positions are read from a text file, then I can get the 
relative distances e.g. with some Fortran routine I import with f2py if 
speed is a problem.
If this is manageable, then I could really end up calculating directly 
the relative distances between all the particles and then do as 
suggested in the following.
I may post again early in January, however (I do not have enough time to 
test all of this straight away).

> Message: 3
> Date: Thu, 27 Dec 2007 01:24:26 -0500
> From: David Warde-Farley <dwf@cs.toronto.edu>
> Subject: Re: [SciPy-user] Scipy for Cluster Detection
> To: SciPy Users List <scipy-user@scipy.org>
> Message-ID: <C8CF9489-1457-46AF-B453-62A338B422B5@cs.toronto.edu>
> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
> On 26-Dec-07, at 7:28 PM, Lorenzo Isella wrote:
>> Dear All,
>> I hope this will not sound too off-topic.
>> Without going into details, I am simulating the behavior of a set of
>> particles interacting with a short-ranged, strongly binding  
>> potential. I
>> start with a number N of particles which undergo collisions with each
>> other and, in doing so, they stick giving rise to clusters.
>> The code which does that is not written in Python, but it returns the
>> positions of these particles in space.
>> As time goes on, more and more particles coagulate around one of these
>> clusters. Eventually, they will all end up in the same cluster.
>> Two particles are considered to be bounded if their distance falls  
>> below
>> a certain threshold d.
>> A cluster is nothing else than a group of particles all directly or
>> indirectly bound together.
> This doesn't sound like what would typically be considered a
> Do you have code that produces the (N^2 - N)/2 distances between pairs  
> of particles? How large is N? If it's small enough, turning it into a  
> NumPy array should be straightforward, and then thresholding at d is  
> one line of code. Then what you end up with is a boolean matrix that  
> where the (i,j) entry denotes that i and j are in a cluster together.  
> This can be thought of as a graph or network where the nodes are  
> particles and connections / links / edges exist between those  
> particles that are in a cluster together.
> Once you have this your problem is what graph theorists would call  
> identifying the "connected components", which is a well-studied  
> problem and can be done fairly easily, Google "finding connected  
> components" or pick up any algorithms textbook at your local library  
> such as the Cormen et al "Introduction to Algorithms".
> I'm not in a position to comment on whether there's something like  
> this already in SciPy, as I really don't know. However, finding  
> connected components of a graph (i.e. your clusters) is a very well  
> studied problem in the computer science (namely graph theory)  
> literature, and the algorithms are simple enough that less than 20  
> lines of Python should do the trick provided the distance matrix is  
> manageable.
> Regards,
> David

More information about the SciPy-user mailing list