[SciPy-dev] [Numpy-discussion] Proposal: scipy.spatial

Prabhu Ramachandran prabhu@aero.iitb.ac...
Sat Oct 4 07:37:37 CDT 2008


Nathan Bell wrote:
> On Wed, Oct 1, 2008 at 12:53 PM, Prabhu Ramachandran
> <prabhu@aero.iitb.ac.in> wrote:
>> I thought this was a discussion on a good API for such an algorithm
>> (sorry if I misunderstood!).  Given that context, if this is a general
>> purpose algorithm I'd rather not assume that the number of results is
>> typically going to be small.
> However, you *can* assume log(k) of a k-nearest neighbor query is
> going to be small.


>> If there is no need for sorting and there are a large number of nearest
>> neighbor queries (in my case there are O(N) nearest neighbor queries),
>> this quickly adds up to an unnecessary O(N) inefficiency that I'd like
>> to avoid.
> Although you might not sort a k-nearest neighbor query, but as Anne
> has said, you end up needing a sorted structure (like a priority
> queue) to perform the query itself anyway.

In the context of a kd-tree, yes, but not in other contexts like the "in 
sphere" query you mention below.

> The only case where not sorting makes sense is an "in sphere" query
> where there's no need to prioritize the results.  In this case, I'd
> still argue that the overhead of sorting isn't likely to matter.

Well, in the case of the kd-tree I understand that this is not likely to 
matter.  I was under the mistaken impression that this was to be a 
common API.  I re-read the thread and understand now that this is a 
first cut for just the KDTree.

Personally, I have a need for finding the exact nearest neighbors within 
an epsilon ball (in 1, 2, and 3D) of *all* the nodes.  Typically epislon 
is known (or at least bounded) a-priori.  In this context I typically 
build a binary/quad/oct tree and limit the size of the leaves to around 
epsilon. I avoid tree traversals by iterating over the leaves in order 
to get to the points they contain -- note that the target point is 
always contained in the tree which is why I can do this.   Anyway, in 
this context sorting the results ends up in O(N klog(k)) operations if 
you have k neighbors on the average.  This is expensive.

In any case, I am not sure there is a general need for this kind of data 
structure.  I need it a whole lot.  If there is a general need I'll be 
glad to try and extract any of my code and contribute it to scipy.spatial.


More information about the Scipy-dev mailing list