[SciPy-dev] [Numpy-discussion] Proposal: scipy.spatial
Charles R Harris
charlesr.harris@gmail....
Mon Oct 6 16:03:56 CDT 2008
On Sun, Oct 5, 2008 at 2:56 AM, Anne Archibald <peridot.faceted@gmail.com>wrote:
> 2008/10/4 Prabhu Ramachandran <prabhu@aero.iitb.ac.in>:
> > Anne Archibald wrote:
> >> 2008/10/4 Prabhu Ramachandran <prabhu@aero.iitb.ac.in>:
> >>> In any case, I am not sure there is a general need for this kind of
> data
> >>> structure. I need it a whole lot. If there is a general need I'll be
> >>> glad to try and extract any of my code and contribute it to
> scipy.spatial.
> >>
> >> There does seem to be a general need for a two-tree query that
> >> extracts (or just counts) all items of one tree within a ball centered
> >> at each item of another tree. You're right, too, that "in a ball"
> >> queries often don't naturally produce sorted results, or even known
> >> distances. But I think this has to be a different query. In fact, even
> >> for single-point queries it may make sense to have a separate query
> >> function for in-a-ball queries.
> >
> > True and another reason for a different approach would be performance.
> > According to Lee and Wong (1970, Acta Informatica) the worst case
> > performance for partial region searches on k-d trees is O(k N^{1-1/k}).
> > So if the dimensionality is small and the region size is know we could
> > do a lot better with other specialized algorithms for these cases.
>
> I have added several in-a-ball queries to the kd-tree implementation
> in the spatial branch. Please let me know what you think. The API is a
> bit ad-hoc, since I can see a number of different ways one might want
> to use the code. In particular, some query functions do not currently
> return the actual distance, since one of the shortcuts is to include
> whole branches of the tree if their bounding box lies inside the ball
> of interest. I suspect that for many algorithms of interest it will be
> necessary to write a custom tree-walking algorithm. I'm not sure
> whether it makes sense to try to provide functions to support this.
>
I'm wondering if this might be a good place to add an equivalence relation
also. An equivalence relation is a good way to do clustering based on
distances, i.e., a.join(i,j) where i and j are objects creates a cluster
that is the union of those containing i and j respectively. All the
clustered elements can then be recovered using something like a.elements(i).
Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://projects.scipy.org/pipermail/scipy-dev/attachments/20081006/a41148ff/attachment.html
More information about the Scipy-dev
mailing list