[SciPy-user] Scipy for Cluster Detection
Thu Dec 27 10:44:45 CST 2007
Many thanks for the useful answers.
I should have added that I have in mind to deal with system with at most
a few thousands of particles, each of them labeled by 3 coordinates.
The particle positions are read from a text file, then I can get the
relative distances e.g. with some Fortran routine I import with f2py if
speed is a problem.
If this is manageable, then I could really end up calculating directly
the relative distances between all the particles and then do as
suggested in the following.
I may post again early in January, however (I do not have enough time to
test all of this straight away).
> Message: 3
> Date: Thu, 27 Dec 2007 01:24:26 -0500
> From: David Warde-Farley <email@example.com>
> Subject: Re: [SciPy-user] Scipy for Cluster Detection
> To: SciPy Users List <firstname.lastname@example.org>
> Message-ID: <C8CF9489-1457-46AF-B453-62A338B422B5@cs.toronto.edu>
> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
> On 26-Dec-07, at 7:28 PM, Lorenzo Isella wrote:
>> Dear All,
>> I hope this will not sound too off-topic.
>> Without going into details, I am simulating the behavior of a set of
>> particles interacting with a short-ranged, strongly binding
>> potential. I
>> start with a number N of particles which undergo collisions with each
>> other and, in doing so, they stick giving rise to clusters.
>> The code which does that is not written in Python, but it returns the
>> positions of these particles in space.
>> As time goes on, more and more particles coagulate around one of these
>> clusters. Eventually, they will all end up in the same cluster.
>> Two particles are considered to be bounded if their distance falls
>> a certain threshold d.
>> A cluster is nothing else than a group of particles all directly or
>> indirectly bound together.
> This doesn't sound like what would typically be considered a
> Do you have code that produces the (N^2 - N)/2 distances between pairs
> of particles? How large is N? If it's small enough, turning it into a
> NumPy array should be straightforward, and then thresholding at d is
> one line of code. Then what you end up with is a boolean matrix that
> where the (i,j) entry denotes that i and j are in a cluster together.
> This can be thought of as a graph or network where the nodes are
> particles and connections / links / edges exist between those
> particles that are in a cluster together.
> Once you have this your problem is what graph theorists would call
> identifying the "connected components", which is a well-studied
> problem and can be done fairly easily, Google "finding connected
> components" or pick up any algorithms textbook at your local library
> such as the Cormen et al "Introduction to Algorithms".
> I'm not in a position to comment on whether there's something like
> this already in SciPy, as I really don't know. However, finding
> connected components of a graph (i.e. your clusters) is a very well
> studied problem in the computer science (namely graph theory)
> literature, and the algorithms are simple enough that less than 20
> lines of Python should do the trick provided the distance matrix is
More information about the SciPy-user