[SciPy-dev] [GSoC 2008]Machine learning package in SciPy

Anton Slesarev slesarev.anton@gmail....
Tue Mar 11 14:19:19 CDT 2008


>
> David Cournapeau is maintaining the learn scikit. This is the main place
> where machine learning code will be put.
> For instance, there are classifiers (SVMs with libsvm) and there will be
> in the near future the more used manifold learning techniques.
>
> I didn't understand what you meant by "you want to see common which
> features was selected by different tools".


I mean that if we have standard format for different classifiers we can
compare their results, we can see intersection of features that have been
selected. If we use different tools, it is need to make exhausting
conversions between different formats.


>
> Sparse matrix support must be made at the C level for libsvm, you would
> have to ask Albert who wrapped libsvm.

I see. I say that it is good idea to write parsers for different data
formats.

>
> For the manifold learning code, techniques that can support sparse
> matrices support them (for instance Laplacian Eigenmaps).
>
> Matthieu
>
> 2008/3/11, Anton Slesarev <slesarev.anton@gmail.com>:
> >
> > Hi all,
> >
> > it might be a good idea to have a machine learning(ML) package in SciPy.
> > As I understand there are some ML code in SciKits, but it is in raw state?
> >
> > There are a lot of machine learning projects, with its own data format,
> > number of classifiers, feature selection algorithms and benchmarks. But if
> > you want to compare your own algorithm with some others, you should convert
> > your data format to input format of every tool you want to use and after
> > training, you should convert output format of each tools to the single
> > format to have facility to compare results(for example you want to see
> > common which features was selected by different tools).
> >
> > Now I'm analyzing different ML approaches for the special case of text
> > classification problem. I couldn't find ML framework appropriate for my
> > task. I've got two simple requirements for this framework. It should support
> > sparse data format and has at least svm classifier. For example, Orange [1]
> > is a vary good data mining project but has poor sparse format support. PyML
> > [2] has all needed features, but there are problems with installation on
> > different platforms and code design is not perfect.
> >
> > I believe that creation framework, which will be convenient for
> > scientist to integrate their algorithms to it, is a vary useful challenge.
> > Scientists often talk about standard machine learning software[3] and may be
> > SciPy will be appropriate platform for developing such software.
> >
> > I can write detailed proposal, but I want to see is it interesting for
> > someone? Any wishes and recommendations?
> >
> > 1. Orange http://magix.fri.uni-lj.si/orange/
> > 2. PyML http://pyml.sourceforge.net/
> > 3. The Need for Open Source Software in Machine Learning
> > http://www.jmlr.org/papers/volume8/sonnenburg07a/sonnenburg07a.pdf
> >
> > --
> > Anton Slesarev
> > _______________________________________________
> > Scipy-dev mailing list
> > Scipy-dev@scipy.org
> > http://projects.scipy.org/mailman/listinfo/scipy-dev
> >
> >
>
>
> --
> French PhD student
> Website : http://matthieu-brucher.developpez.com/
> Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92
> LinkedIn : http://www.linkedin.com/in/matthieubrucher
> _______________________________________________
> Scipy-dev mailing list
> Scipy-dev@scipy.org
> http://projects.scipy.org/mailman/listinfo/scipy-dev
>
>


-- 
Anton Slesarev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://projects.scipy.org/pipermail/scipy-dev/attachments/20080311/226906d4/attachment.html 


More information about the Scipy-dev mailing list