[SciPy-user] scipy data mining ?

Karl Young Karl.Young at ucsf.edu
Thu Jan 25 12:51:07 CST 2007


>Right know, scipy has an implementation of SVM (scipy.sandbox.svm), EM 
>for finite mixture of Gaussians (scipy.sandbox.pyem), and some other 
>which I don't really know (models, maxent) which are all more or less 
>relevant for machine learning. There are also some tools around for 
>other stuff like Kalman filtering, MCMC, which are not part (at least in 
>my knowledge) of scipy. Some kind of unification would be nice, but this 
>would require some work, and people interested in it.
>  
>
Yes, it's a little strange that just using the algorithms scattered 
around isn't sufficient, but it's a certain style that one gets used to 
after working with a package like Weka that is useful.  At first I do 
the unthinkable and just run a data set (using different subsets in a 
way that Weka makes easy and automateable) through a zillion methods 
(which the unified interface makes easy), essentially as black boxes. 
The object is then to explore more carefully why particular algorithms 
and subsets of the data (hopefully) show some structure (not to succumb 
to the temptation to cherry pick the best result and publish ! but 
nobody ever does that...).    

>One thing you could try is using scipy with R, which should have much 
>more complete models, and which can be used from a python session (at 
>least I remember having seen that somewhere, I have never done it myself).
>  
>
I do use R and rpy in code that also does a lot of scipy processing but, 
interestingly, even R doesn't have any packages , that I'm aware of, 
that pull together as many algorithms as Weka, or even Orange (with a 
unified interface - i.e. in R it would presumably amount to some package 
that would allow one to easily hack a data frame and pass that off to a 
single command with an argument specifying the machine learning 
algorithm and optional parameters)

>It looks like Orange uses pyqt for the graphical part, which is GPL 
>(because QT is since at least 4.0 on all supported plateforms including 
>windows); this would make the graphical part of orange quite difficult 
>to relicense without some recoding. But I don't much about license 
>"mixing", and the mix of python, modules, library with different license 
>makes it quite difficult for me to really understand what is possible or 
>
>  
>

After forwarding some of this discussion to the Orange developers they 
seemed amenable to exploring scipy and integration possibilities, 
including changing licenses, but were understandably reluctant to commit 
to anything that might require a lot of effort on their part. Given the 
complication added by QT licensing, a possible route might be to see if 
the compute engine could be effectively decoupled from the ui and 
licensed separately.

Karl Young
Center for Imaging of Neurodegenerative Diseases, UCSF         
VA Medical Center (114M)
4150 Clement Street
San Francisco, CA 94121              
Phone:  (415) 221-4810 x3114  lab                          
FAX:    (415) 668-2864
Email:  karl young at ucsf edu




More information about the SciPy-user mailing list