[SciPy-dev] Ball Tree code (ticket 1048)
Mon Nov 16 19:49:21 CST 2009
2009/11/16 Jake VanderPlas <firstname.lastname@example.org>:
> I wrote to the list about this a couple weeks ago, but there hasn't
> been any activity. At the link below, there is a tarball with a
> distutils setup script, and a simple script that compares the
> performance of BallTree with scipy.spatial.cKDTree. This is my first
> submission to scipy - I'd love some feedback if you get a chance!
I took a quick look at the code. I have a few comments:
* The code is specific to Euclidean distances.
* The code is specific to R**n.
* There's no approximate searching.
* The way the tree is constructed is very nearly a kd-tree: you
subdivide the dimension with the largest span. The only difference
between this and a kd-tree is that rather than store a cut plane at
each node, you store a ball containing the children of that node.
In principle, ball trees (and similar structures) can work on
arbitrary metric spaces. I don't know how important that feature is to
anyone; maybe anyone who needs such a thing needs a smarter algorithm
than ball trees (e.g. cover trees).
Given the last point above, I don't really understand why your code is
faster than cKDTree. Is the difference in how the tree is constructed
(I think cKDTree uses a different heuristic to find the cut plane) or
in the fact that you keep track of a ball containing each node? If the
former, then it might make more sense to add an option of a different
heuristic to cKDTree. If the latter, it might make more sense to use a
different heuristic in the ball tree code - the traditional one is
partitioning the data into inside/outside a ball at each level of the
tree, which might allow faster construction, and faster search
(because the balls might shrink faster as you walk down the tree).
Overall, it's hard to argue with something that works and is faster in
concrete cases. But I'm not sure how general the code is.
> Scipy-dev mailing list
More information about the Scipy-dev