[SciPy-dev] Another GSoC idea

David Warde-Farley dwf@cs.toronto....
Sat Mar 21 00:01:46 CDT 2009


I've been fiddling with ideas for GSoC related to SciPy and I wanted  
to run this by people on the list.

David C. and others are often complaining that C and Fortran code is  
an order of magnitude harder to maintain than Python/Cython code.  
Thus, would there be interest in a proposal that included rewriting  
Damian Eads' excellent scipy.spatial.distance and scipy.cluster.vq in  
Cython?

I've already been scoping this out as I had wanted to add output  
matrix functionality to scipy.spatial.pdist and scipy.spatial.cdist,  
which would make scenarios where distances are recomputed frequently  
(as in some sort of tracking application) much less memory-intensive.  
kmeans

Also at the back of my mind have been implementing some of the tricks  
found in the literature for speeding up k-means (optimized versions  
that take advantage of the triangle inequality, for instance; "online"  
k-means, by which I mean updating the means with the contribution of  
each data point sequentially as opposed to considering them all at  
once). I'd also like to see the addition of exemplar based methods  
such as k-centers and the relatively new affinity propagation (there  
is a reference implementation of the latter which would be unsuitable  
for direct translation from MATLAB due to licensing, so I'd be  
proposing a clean-room implementation).

Any feedback, additional suggestions would be welcome.

Thanks,

David


More information about the Scipy-dev mailing list