[SciPy-user] scipy data mining ?

David Cournapeau david at ar.media.kyoto-u.ac.jp
Wed Jan 24 22:55:55 CST 2007

Karl Young wrote:
> Thanks for the suggestions but, yes, I feel that it's the set of core 
> machine learning algorithms that is the important piece. There are some 
> implementations of specific algorithms around but it seems like a lot of 
> work to include a significant fraction of recently developed machine 
> learning algorithms in a single package with a consistent interface (as 
> the Weka guys have done). Since the (exploratory) tendency in data 
> mining is to not trust any single algorithm for all data analysis it's 
> important to have a range available. Given that the Orange developers 
> have already started something like that it seemed reasonable to explore 
> some kind of integration (I could just use it anyway and hack whatever I 
> need to use it with numpy array utilities but some kind of moderated 
> integration seemed more consistent with the scipy philosophy).
I don't know much about data mining; I am a user of some algorithms used 
in pattern recognition, but not really into the exploratory part which 
is important in data mining I think.

Right know, scipy has an implementation of SVM (scipy.sandbox.svm), EM 
for finite mixture of Gaussians (scipy.sandbox.pyem), and some other 
which I don't really know (models, maxent) which are all more or less 
relevant for machine learning. There are also some tools around for 
other stuff like Kalman filtering, MCMC, which are not part (at least in 
my knowledge) of scipy. Some kind of unification would be nice, but this 
would require some work, and people interested in it.

One thing you could try is using scipy with R, which should have much 
more complete models, and which can be used from a python session (at 
least I remember having seen that somewhere, I have never done it myself).

It looks like Orange uses pyqt for the graphical part, which is GPL 
(because QT is since at least 4.0 on all supported plateforms including 
windows); this would make the graphical part of orange quite difficult 
to relicense without some recoding. But I don't much about license 
"mixing", and the mix of python, modules, library with different license 
makes it quite difficult for me to really understand what is possible or 



More information about the SciPy-user mailing list