[SciPy-Dev] Cover trees for nearest neighbors in general metric space

Jacob VanderPlas vanderplas@astro.washington....
Sat Mar 10 16:50:17 CST 2012


Hi,
I should add to the discussion here the work I've been doing lately on 
Ball Tree.  You can find my progress in github.com/jakevdp/pyDistances.  
The goal of the project is to allow all the metrics from 
scipy.spatial.distances to be used with Ball Tree.

To this end, I've implemented all the metrics in C/cython, and written a 
"DistMetrics" class wrapper that exposes python hooks into these 
functions through a pdist/cdist method.  The speed is comparable to 
scipy.spatial.distance.pdist / cdist for c-ordered arrays, and I'm 
nearly ready to push a big set of commits which will allow these methods 
to support csr-format sparse arrays as well.

Once I have this machinery place, I plan to extend the BallTree class to 
work for both sparse and dense inputs, and support any of the distance 
metrics currently in scipy.spatial.

I mainly had scikit-learn in mind for these enhancements, but it may fit 
in scipy.spatial as well.  Thoughts?
    Jake
 
Gael Varoquaux wrote:
> On Sat, Mar 10, 2012 at 11:18:53PM +0100, Ralf Gommers wrote:
>   
>>> Eventually, I'd like to propose it for inclusion in scipy, since this
>>> functionality is not exclusive to machine learning.  I'm using it for
>>> molecular simulations, and didn't even know that nearest-neighbor queries
>>> were useful in machine learning!  I would never have though of looking in
>>> scikit-learn for this.  But I would like to address the two outstanding
>>> issues (vectorized distances and Cython implementation) before proposing it
>>> for inclusion.  On that same vein, I don't know why BallTree is in
>>> scikit-learn and not scipy, for the same reasons.
>>>       
>
>   
>> Resistance to adding more C++ code IIRC.
>>     
>
> It's been rewritten in Cython:
> https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/neighbors/ball_tree.pyx
>
> If we can find the resources, would scipy be interested in the new
> version of the code?
>
> Gaël
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>   


More information about the SciPy-Dev mailing list