[SciPy-dev] SciPy improvements

David Cournapeau david@ar.media.kyoto-u.ac...
Fri Apr 13 00:57:31 CDT 2007


Bill Baxter wrote:
> On 4/13/07, David Cournapeau <david@ar.media.kyoto-u.ac.jp> wrote:
>> Bill Baxter wrote:
>>> On 4/13/07, David Cournapeau <david@ar.media.kyoto-u.ac.jp> wrote:
>>> I would be interested in joining a dev list on this or something like
>>> that (or open dev blog? or wiki?) if you start such a thing.  I assume
>>> you have to have discussions with your mentor anyway.  If possible
>>> it'd be nice to peek in on those conversations.
>>>
>> There is nothing started yet, and some things need to be fixed with my
>> mentor before things get started, but as Robert said, most if not all
>> discussion related to it would happen here and follow the usual scipy
>> process (scipy SVN, Trac, etc...).
>
> Great then.
>
> The project page mentions SVM.  In addition to SVM I'm interested in
> things like PPCA, kernel PCA, RBF networks, gaussian processes and
> GPLVM.  Are you going to try to go in the direction of a modular
> structure with reusable bits for for all kernel methods, or is the
> plan to targeted specifically SVM?
The plan is really about unifying and improving existing toolboxes, and 
provides a higher level API (which would end up in scikits for various 
reasons). Depending on the time left, I will add some algorithms later. 
Of course, the goal is that other people will also jump in to add new 
algorithms (I for example will add some recent advances for mixture like 
ensemble learning, outside of the SoC if necessary).
>
> The basic components of this stuff (like RBFs) also make for good
> scattered data interpolation schemes.  I hear questions every so often
> on the list about good ways to do that, so making the tools for the
> machine learning toolkit easy to use for people who just want to
> interpolate data would be nice.
>
> Going in a slightly different direction, meshfree methods for solving
> partial differential equations also build on tools like RBF and moving
> least squares interpolation.  So for that reason too, it would be nice
> to have a reusable api layer for those things.
>
> You mention also that you're planning to unify row vec vs. column vec
> conventions.  Just wanted to put my vote in for row vectors!  For a
> number of reasons
> 1) It seems to be the more common usage in machine learning literature
> 2) with Numpy's default C-contiguous data it puts individual vectors
> in contiguous memory.
> 3) it's easier to print something that's Nx5 than 5xN
> 4) "for vec in lotsofvecs:" works with row vectors, but needs a
> transpose for column vectors.
> 5) accessing a vector becomes just data[i] instead of data[:,i] which
> makes it easier to go back and forth between a python list of vectors
> and a numpy 2d array of vectors.
I have not given a lot of thoughts about it yet; what matters the most 
is that all algo have the same conventions. Nevertheless, my experience 
so far in numpy is similar to yours with regard to ML algorithms (except 
point 2: depending on the algo. you need contiguous access along one 
dimension, and my impression is that in numpy, this matters a lot 
performance wise, at least much more than in matlab).

David


More information about the Scipy-dev mailing list