# [SciPy-user] Scipy for Cluster Detection

Stefan van der Walt stefan@sun.ac...
Sat Dec 29 10:43:25 CST 2007

```On Thu, Dec 27, 2007 at 01:24:26AM -0500, David Warde-Farley wrote:
> Once you have this your problem is what graph theorists would call
> identifying the "connected components", which is a well-studied
> problem and can be done fairly easily, Google "finding connected
> components" or pick up any algorithms textbook at your local library
> such as the Cormen et al "Introduction to Algorithms".

I have implemented the O(N) algorithm described in

Christophe Fiorio and Jens Gustedt,
"Two linear time Union-Find strategies for image processing",
Theoretical Computer Science 154 (1996), pp. 165-181.

and

Kensheng Wu, Ekow Otoo and Arie Shoshani,
"Optimizing connected component labeling algorithms",
Paper LBNL-56864, 2005,
Lawrence Berkeley National Laboratory
(University of California),
http://repositories.cdlib.org/lbnl/LBNL-56864.

The code can be found at

http://mentat.za.net/source/connected_components.tar.bz2

Run 'scons' to compile (I can send a setup.py if necessary).

Typical usage:

x = N.array([[0, 0, 3, 2, 1, 9],
[0, 1, 1, 9, 2, 9],
[0, 0, 1, 9, 9, 9],
[3, 1, 1, 5, 3, 0]])

labels = connected_components.linear(x)

Result:

[[0, 0, 1, 2, 3, 4],
[0, 5, 5, 4, 6, 4],
[0, 0, 5, 4, 4, 4],
[7, 5, 5, 8, 9, 10]])

The given source searches for neighbours that have the same values,
but this can easily be modified to look for values within a certain
difference range.

Regards
Stéfan
```