[SciPy-dev] SciPy improvements
Fri Apr 13 00:57:31 CDT 2007
Bill Baxter wrote:
> On 4/13/07, David Cournapeau <email@example.com> wrote:
>> Bill Baxter wrote:
>>> On 4/13/07, David Cournapeau <firstname.lastname@example.org> wrote:
>>> I would be interested in joining a dev list on this or something like
>>> that (or open dev blog? or wiki?) if you start such a thing. I assume
>>> you have to have discussions with your mentor anyway. If possible
>>> it'd be nice to peek in on those conversations.
>> There is nothing started yet, and some things need to be fixed with my
>> mentor before things get started, but as Robert said, most if not all
>> discussion related to it would happen here and follow the usual scipy
>> process (scipy SVN, Trac, etc...).
> Great then.
> The project page mentions SVM. In addition to SVM I'm interested in
> things like PPCA, kernel PCA, RBF networks, gaussian processes and
> GPLVM. Are you going to try to go in the direction of a modular
> structure with reusable bits for for all kernel methods, or is the
> plan to targeted specifically SVM?
The plan is really about unifying and improving existing toolboxes, and
provides a higher level API (which would end up in scikits for various
reasons). Depending on the time left, I will add some algorithms later.
Of course, the goal is that other people will also jump in to add new
algorithms (I for example will add some recent advances for mixture like
ensemble learning, outside of the SoC if necessary).
> The basic components of this stuff (like RBFs) also make for good
> scattered data interpolation schemes. I hear questions every so often
> on the list about good ways to do that, so making the tools for the
> machine learning toolkit easy to use for people who just want to
> interpolate data would be nice.
> Going in a slightly different direction, meshfree methods for solving
> partial differential equations also build on tools like RBF and moving
> least squares interpolation. So for that reason too, it would be nice
> to have a reusable api layer for those things.
> You mention also that you're planning to unify row vec vs. column vec
> conventions. Just wanted to put my vote in for row vectors! For a
> number of reasons
> 1) It seems to be the more common usage in machine learning literature
> 2) with Numpy's default C-contiguous data it puts individual vectors
> in contiguous memory.
> 3) it's easier to print something that's Nx5 than 5xN
> 4) "for vec in lotsofvecs:" works with row vectors, but needs a
> transpose for column vectors.
> 5) accessing a vector becomes just data[i] instead of data[:,i] which
> makes it easier to go back and forth between a python list of vectors
> and a numpy 2d array of vectors.
I have not given a lot of thoughts about it yet; what matters the most
is that all algo have the same conventions. Nevertheless, my experience
so far in numpy is similar to yours with regard to ML algorithms (except
point 2: depending on the algo. you need contiguous access along one
dimension, and my impression is that in numpy, this matters a lot
performance wise, at least much more than in matlab).
More information about the Scipy-dev