[Numpy-discussion] Proposal: scipy.spatial
Thu Oct 9 15:28:34 CDT 2008
I have written up basic nearest neighbor algorithm. It does a brute
force search so it will be slower than kdtrees as the number of points
gets large. It should however work well for high dimensional data. I
have also added the option for user defined distance measures. The
user can set a default "p". "p" has the same functionality if it is a
float. "p" can also be a function that computes a distance matrix or
the measure can be selected using the strings: "Manhattan",
"Euclidean", or "Correlation".
The interface is similar to Anne's code and in many cases can be used
as a drop in replacement. I have posted the code to my own project
because I have a short term need and I do not have access to the scipy
repository. Feel free to include the code with scipy under the scipy
I did find a typo your documentation.
typo "trie -> tree" - ... kd-tree is a binary trie, each of whose ...
Also I found the use of k in the documentation some what confusing as
it is the dimensionality of the data points in the kd-tree and the
number of neighbors for k-nearest neighbors.
>> I believe that with mean subtracted and unit length vectors, a
>> Euclidean knn algorithm will produces the same result as if the
>> vectors were compared using correlation. I am not sure if kd-trees
>> will perform well on the normalized vectors as they have a very
>> specific geometry. If my math checks out it may be worth adding
>> Pearson's correlation as a default option or as a separate class.
> Actually it's probably easier if the user just does the
I agree. I think we should keep your class as-is and maybe create a
class that wraps the kdtree and handles the normalization for
correlation. I would also like to look at cover trees, however that
will have to wait a couple months until I have more time.
More information about the Numpy-discussion