[SciPy-Dev] Faster implementation of cluster.hierarchy
Charles R Harris
Wed Oct 12 09:17:37 CDT 2011
On Wed, Oct 12, 2011 at 5:12 AM, Conrad Lee <firstname.lastname@example.org> wrote:
> A mathematician at Stanford named Daniel Müllner recently came up with a
> package that implements the hierarchical clustering methods found in
> scipy.cluster.hierarchy. His implementation is in C++, but includes a
> python API that uses the same interface as scipy.cluster.hierarchy.
> Müllner has posted benchmarks as well as algorithmic explanations of why
> his implementation is faster in a paper on arXiv<http://arxiv.org/abs/1109.2378>.
> He also has a webpage that describes the package here<http://math.stanford.edu/%7Emuellner/fastcluster.html>
> Because the results of the benchmarks look good, I am interested in getting
> the scikit-learn package to use this implementation for the hierarchical
> clustering provided by that package. Rather than integrate the code in
> scikit-learn, it seems more appropriate to integrate it upstream in
> scipy.cluster.hierarchy. Is there anyone who is interested in this
> integration? I am inexperienced with integrating C++ code and python code,
> and also with how things work in the scipy project, so I'm not sure how to
> Note: Although Müllner's code is currently under a GPL license, he has
> stated to me in e-mail that he would be willing to put it under the BSD-2
> license it somebody put the time to integrate it into scipy.
Not my area, but I think it is a good thing to encourage such contributions.
If the new code preserves the interface, comes with tests and documentation,
and performs better, then I am all in favor of getting it in. I believe
there is already a fair amount of c++ in scipy, so that shouldn't be a
problem and there are probably folks who can give you advice on how to
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the SciPy-Dev