[Numpy-discussion] Proposal: scipy.spatial

Anne Archibald peridot.faceted@gmail....
Thu Oct 2 20:57:08 CDT 2008

2008/10/2 David Bolme <bolme1234@comcast.net>:
> It may be useful to have an interface that handles both cases:
> similarity and dissimilarity.  Often I have seen "Nearest Neighbor"
> algorithms that look for maximum similarity instead of minimum
> distance.  In my field (biometrics) we often deal with very
> specialized distance or similarity measures.  I would like to see
> support for user defined distance and similarity functions.  It should
> be easy to implement by passing a function object to the KNN class.  I
> am not sure if kd-trees or other fast algorithms are compatible with
> similarities or non-euclidian norms, however I would be willing to
> implement an exhaustive search KNN that would support user defined
> functions.

kd-trees can only work for distance measures which have certain
special properties (in particular, you have to be able to bound them
based on coordinate differences). This is just fine for all the
Minkowski p-norms (so in particular, Euclidean distance,
maximum-coordinate-difference, and Manhattan distance) and in fact the
current implementation already supports all of these. I don't think
that correlation can be made into such a distance measure - the
neighborhoods are the wrong shape. In fact the basic space is
projective n-1 space rather than affine n-space, so I think you're
going to need some very different algorithm. If you make a metric
space out of it - define d(A,B) to be the angle between A and B - then
cover trees can serve as a spatial data structure for nearest-neighbor
search. Cover trees may be worth implementing, as they're a very
generic data structure, suitable for (among other things)
low-dimensional data in high-dimensional spaces.


More information about the Numpy-discussion mailing list