[Numpy-discussion] finding close together points.

Lou Pecora lou_boog2000@yahoo....
Thu Nov 12 12:40:53 CST 2009



----- Original Message ----
From: Christopher Barker <Chris.Barker@noaa.gov>
To: Discussion of Numerical Python <numpy-discussion@scipy.org>
Sent: Thu, November 12, 2009 12:37:37 PM
Subject: Re: [Numpy-discussion] finding close together points.

Lou Pecora wrote:
> a KD tree for 2D nearest neighbor seems like over kill.  You
> might want to try the simple approach of using boxes of points to
> narrow things down by sorting on the first component.

yeah, we'll probably do something like that if we have to write the code 
ourselves. At the moment, we're using geohash:

http://pypi.python.org/pypi/Geohash/

(this is for points on the earth)

and it's working OK. I was just hoping kdtree would work out of the box!

> where for a
> static data set it can match KD trees in speed

Why does it have to be static -- it doesn't look hard to insert/remove 
points.

--- Lou answers ----------------------


It doesn't have to be static, but if I remember correctly, when you add points you have to resort or do a "smart" insert (which may require a lot of data "shifting").  Not hard, but the original paper claimed that if there are a lot of changes to the data set this becomes more expensive than the kd-tree.  If you can get the original paper I mentioned, it will have more info.  It's pretty clearly written and contains some nice comparisons to kd-trees for finding near neighbors.  

Bottom line is that you can still try it with changing data.  See what the run times are like.  I've used the approach for up to eight dimensions with about 10^5 data points.  It worked nicely.  I remember looking for a kd-tree library that would work out of the box (C or C++), but everything I found was not as simple as I would have liked.  The slice searching was a nice solution.  But maybe I just didn't know where to look or I'm not understanding some of the libraries I did find.

Let us know how it goes.

-- Lou Pecora


      


More information about the NumPy-Discussion mailing list