[SciPy-dev] scipy.distance

Anne Archibald aarchiba@physics.mcgill...
Mon Nov 3 02:36:17 CST 2008


2008/11/3 Charles R Harris <charlesr.harris@gmail.com>:

> I usually want a complete list of points in some neighborhood. I looked
> through your cython code and I think the loops can be improved a bit to make
> better use of low level C code.

The C implementation doesn't currently do this at all. It'd be a good
addition, though I think you'd have to use object arrays of lists,
which have always made me faintly queasy. You'd want a whole separate
tree-traversal routine here, with short-circuit branches for both
all-in-the-neighborhood and all-outside-the-neighborhood. Since kdtree
construction is rather fast, does it perhaps make sense to write a
two-tree version?

> I'm working on some cover tree code with an eye to future inclusion unless
> someone beats me to it.

That would be great. I'm afraid the benefits of a cython
implementation are going to be somewhat limited by the fact that users
are going to want to supply their own distance functions. But it's
probably worth allowing user distance functions to accept a
short-circuiting distance argument: d(A,B,r) returns the distance if
it's less than r, but if the true distance is greater than r it
returns an arbitrary value rather than r.  Even for Minkowski
distances this can be a win in high dimension, and for more
sophisticated distance measures it could save even more time.

Anne


More information about the Scipy-dev mailing list