[SciPy-Dev] Cover trees for nearest neighbors in general metric space
Sat Mar 10 16:18:53 CST 2012
On Sat, Mar 10, 2012 at 10:26 PM, Patrick Varilly <firstname.lastname@example.org>wrote:
> On Sat, Mar 10, 2012 at 8:00 PM, Ralf Gommers <email@example.com
> > wrote:
>> On Fri, Mar 9, 2012 at 2:23 AM, Patrick Varilly <firstname.lastname@example.org>wrote:
>>> Dear all,
>>> Following up from the conversation with Emanuele Olivetti and Jake
>>> VanderPlas, I've implemented  a drop-in replacement for
>>> scipy.spatial.kdtree that uses cover trees instead of kd-trees to answer
>>> nearest neighbor queries in a general metric space. To me, this is useful
>>> for finding nearby particles in a 3D periodic box in the context of
>>> molecular simulations, but I'm sure it's more generally useful. It
>>> addresses the same problem that Jake's BallTrees code addresses in
>>> scikit-learn, but I've done my best to reproduce the API of
>>> scipy.spatial.kdtree in order to make this code mostly painless to use. In
>>> particular, kd-tree's useful function for finding all the points in one
>>> tree that are neighbors of every point in another tree (ironically,
>>> "query_ball_tree") is also implemented here for cover trees. I modified
>>> kdtree's extensive unit test to use cover trees, and the code passes it.
>> Out of interest, are you planning to propose this for inclusion in scipy
>> or scikit-learn once it's done, or keep it as a standalone package?
> Eventually, I'd like to propose it for inclusion in scipy, since this
> functionality is not exclusive to machine learning. I'm using it for
> molecular simulations, and didn't even know that nearest-neighbor queries
> were useful in machine learning! I would never have though of looking in
> scikit-learn for this. But I would like to address the two outstanding
> issues (vectorized distances and Cython implementation) before proposing it
> for inclusion. On that same vein, I don't know why BallTree is in
> scikit-learn and not scipy, for the same reasons.
Resistance to adding more C++ code IIRC.
> Finally, in reading carefully through kdtree.py, I found a clear bug in
>>> the code, whereby the "eps" parameter for approximate queries doesn't get
>>> forwarded from the externally visible function "query" to the internal
>>> function "__query" that does the work.
>> Could you open a ticket for that with a little more detail? Or if you
>> feel like it, a pull request would be even better:)\
> I sent in a pull request for it. It's the first time I do this, so
> apologies in advance if I somehow screwed it up.
No apologies necessary - it all looks good. Thanks for doing that.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the SciPy-Dev