[SciPy-Dev] Cover trees for nearest neighbors in general metric space
Sat Mar 10 16:50:17 CST 2012
I should add to the discussion here the work I've been doing lately on
Ball Tree. You can find my progress in github.com/jakevdp/pyDistances.
The goal of the project is to allow all the metrics from
scipy.spatial.distances to be used with Ball Tree.
To this end, I've implemented all the metrics in C/cython, and written a
"DistMetrics" class wrapper that exposes python hooks into these
functions through a pdist/cdist method. The speed is comparable to
scipy.spatial.distance.pdist / cdist for c-ordered arrays, and I'm
nearly ready to push a big set of commits which will allow these methods
to support csr-format sparse arrays as well.
Once I have this machinery place, I plan to extend the BallTree class to
work for both sparse and dense inputs, and support any of the distance
metrics currently in scipy.spatial.
I mainly had scikit-learn in mind for these enhancements, but it may fit
in scipy.spatial as well. Thoughts?
Gael Varoquaux wrote:
> On Sat, Mar 10, 2012 at 11:18:53PM +0100, Ralf Gommers wrote:
>>> Eventually, I'd like to propose it for inclusion in scipy, since this
>>> functionality is not exclusive to machine learning. I'm using it for
>>> molecular simulations, and didn't even know that nearest-neighbor queries
>>> were useful in machine learning! I would never have though of looking in
>>> scikit-learn for this. But I would like to address the two outstanding
>>> issues (vectorized distances and Cython implementation) before proposing it
>>> for inclusion. On that same vein, I don't know why BallTree is in
>>> scikit-learn and not scipy, for the same reasons.
>> Resistance to adding more C++ code IIRC.
> It's been rewritten in Cython:
> If we can find the resources, would scipy be interested in the new
> version of the code?
> SciPy-Dev mailing list
More information about the SciPy-Dev