[SciPy-user] Scipy for Cluster Detection
David Warde-Farley
dwf@cs.toronto....
Thu Dec 27 00:36:53 CST 2007
On 27-Dec-07, at 1:24 AM, David Warde-Farley wrote:
>
> This doesn't sound like what would typically be considered a
This should read, "This doesn't sound like what would typically be
considered a clustering problem." What I meant is, "clustering"
usually means finding a natural way of grouping the data (either
hierarchically or non-hierarchically) where you only have some notion
of similarity or distance to work with.
]
In your problem, your similarity metric is essentially binary ("are
you less than d away from each other", yes or no) and so it reduces to
a rather graph-theoretic problem.
That said, if your number of particles is particularly large, Nathan's
suggestion of a kd-tree solution sounds like it's on the right track.
"Locality sensitive hashing" comes to mind, which is basically a fast
way of finding the nearest neighbour of a given point -- if you knew
this then you'd be able to tell whether any point at all was within d,
since if the nearest neighbour isn't then none of them are.
Take a look at http://web.mit.edu/andoni/www/LSH/index.html
Cheers,
David
More information about the SciPy-user
mailing list