[SciPy-user] scipy data mining ?
david at ar.media.kyoto-u.ac.jp
Wed Jan 24 22:55:55 CST 2007
Karl Young wrote:
> Thanks for the suggestions but, yes, I feel that it's the set of core
> machine learning algorithms that is the important piece. There are some
> implementations of specific algorithms around but it seems like a lot of
> work to include a significant fraction of recently developed machine
> learning algorithms in a single package with a consistent interface (as
> the Weka guys have done). Since the (exploratory) tendency in data
> mining is to not trust any single algorithm for all data analysis it's
> important to have a range available. Given that the Orange developers
> have already started something like that it seemed reasonable to explore
> some kind of integration (I could just use it anyway and hack whatever I
> need to use it with numpy array utilities but some kind of moderated
> integration seemed more consistent with the scipy philosophy).
I don't know much about data mining; I am a user of some algorithms used
in pattern recognition, but not really into the exploratory part which
is important in data mining I think.
Right know, scipy has an implementation of SVM (scipy.sandbox.svm), EM
for finite mixture of Gaussians (scipy.sandbox.pyem), and some other
which I don't really know (models, maxent) which are all more or less
relevant for machine learning. There are also some tools around for
other stuff like Kalman filtering, MCMC, which are not part (at least in
my knowledge) of scipy. Some kind of unification would be nice, but this
would require some work, and people interested in it.
One thing you could try is using scipy with R, which should have much
more complete models, and which can be used from a python session (at
least I remember having seen that somewhere, I have never done it myself).
It looks like Orange uses pyqt for the graphical part, which is GPL
(because QT is since at least 4.0 on all supported plateforms including
windows); this would make the graphical part of orange quite difficult
to relicense without some recoding. But I don't much about license
"mixing", and the mix of python, modules, library with different license
makes it quite difficult for me to really understand what is possible or
More information about the SciPy-user