[SciPy-dev] SciPy Foundation

Gael Varoquaux gael.varoquaux@normalesup....
Sat Aug 1 17:52:16 CDT 2009


On Sat, Aug 01, 2009 at 07:49:21PM +0200, Dag Sverre Seljebotn wrote:
> As you say it is indeed the whole stack that is important. Still, part of
> what you write seems to be an effort to do what many are doing already:
> - EPD
> - Sage (currently maths focused, but it does bundle SciPy and integrating
> it better would )
> - SPD (Sage without some of the math libs)
> - Python(x,y)

> These all bundle SciPy, but also sets up the whole stack, and can focus on
> the whole picture.

> Are you saying that you just want to do it better than these, through a
> foundation? Wouldn't it be better to direct any funding through one of
> these existing candidates?

> This post I've written on the Sage list is very related and is about SciPy
> vs. Sage:
> http://groups.google.com/group/sage-devel/msg/78e2a2032042d35b

I am jumping in this discussion (something that I have been trying to
avoid, because such discussions are very hard to drive to a useful
point). I'll try to write a clear e-mail, to the point, however, as the
previous discussion you are pointing to does not reflect my needs.

On the various usecases and users
===================================

I think that the discussion on the Sage mailing list, and a few points of
the last e-mails I have seen on this mailing list, miss a very important
point for many users of the scipy stack that I see around me:

We want a tool, or a set of tools, to build our own entry points. We want
more than an IDE like Matlab, Mathematica. We want to be able to use the
tools separated, to do data mining on servers log, to build custom
applications for eg medical image analysis, or to control a physics
experiment (there are a lot of talks at the scipy conference this year on
this). Most of the scipy users are "even more applied than applied math"
(golly, this sounds almost dirty ;> ).

Building a reusable stack is why we need tools to be broken up separating
features. Scipy as a community and an umbrella project may benefit from
an IDE, like matlab, or a web interface like the amazing one Sage has,
but we don't want to bundle these features with the core numerical tools
of scipy.

Now this might actually concern only a fractions of users. Many users
(including me) mostly use the scipy tool stack as a matlab/mathematica
replacement. However, these users are not the main code contributors. If
somebody develops an algorithm he wants to ship or to share, chances are
he wants it not to be bound to a heavy platform, but more to a light core
(hey, numpy is even shipped by default on macOSX and many linux
distributions nowadays).


An integrated environment as an entry point
==============================================

Besides building a good set of tools and their documentation, we need to
address two separate issues to make life easier for users: building an
integrated environment (what I call an entry point) and building
distributions. It is tempting to do both at the same time, however, I
think that if we collapse the two problems, we are going in the wrong
directions: I want to be able to reuse the underlying technology of the
integrated environment, for instance to build an astronomic-specific IDE,
and I want to be able to contribute modules to it even if those modules
are not distributed together.

Like many people, my working environment is IPython. It suits my needs,
and I get scientific results using it. However, I can see that it is not
the best solution to guide a beginner. Inspired by matlab, IDL or
mathematica, we have been dreaming of having an IDE for a long while.
Last year, Enthought has payed me to start work on making IPython
GUI-friendly to plug one of the missing bricks to assembling the tool
stack in an IDE. I have been unable to work on this for a year, as it is
not a priority for my research, but the effort lives on in the IPython
repository, and it would be great to see IDE build upon it, and improve
it.

An IDE for easy scientific development with Python would bring together
tools such as a shell, easy access to documentation, and an editor
(reinventing any one of these components might not be necessary). There
is EPDLab, which is being developed in the ETS repository. I love the
technology stack that it is built upon (ETS provides good tools for
building GUIs, and IPython provides an very handy and powerful command
line), and I am thus full of hope for EPDLab. I can see however that
people might be afraid of using it, let alone contributing to it, as it
bares strong Enthought branding. This is a pity, because in this case we
have the chance of having a compagny's interest lying in the same
direction than the community.

For a web environment, the Sage notebook is amazing. Unfortunately last
time I looked, it was GPL licensed, which renders it improper for my use,
as the tools we use at the lab must be BSD, in order to be able to build
(eventually) medical imaging products from them one day.

But, from a more pragmatic point of view the simplest thing to do to make
it easier for a beginner to get started, would be to improve the
documentation on the web. I am not thinking of the specific packages
documentation, but more describing how things fit together: giving the
workflow, and pointing to the various main packages used for different
things. We already have a lot of material on the webpages, but this
material is not as 'sexy' as it could be, and not as to-the-point as
possible. Sure, this is a lot of work too.

Building standard distributions
=================================

I am a huge fan of distributions. Every large applied lab I know ends up
building a distribution mechanism. Without standard distributions, we
cannot reuse each-other's effort to distribute, but also we have huge
friction on reusing each-other's tools: installing on your computer may
be easy, but if you have to worry whether your non-technical users will
succeed in installing a tool, you start wondering whether you want to
rely on the tool, or whether you are going to reimplement it.

However, the other side of the problem is that distributions could end up
developing tools that make use of the tight integration that they provide
to solve numerical or usability problems quicker, while locking the users
in the distribution. If I want to integrate an algorithm developed by
another lab in a medical imaging platform, I cannot afford to drag in
Sage, just like I cannot afford R, or Maltab, as they are too big
dependencies. An IDE that works only on a distribution is not one that I
will rely on for teaching). This is why I believe that every single piece
of code in a distribution should be usable outside of this distribution
(and I applaud the SPD effort started by Ondrej and the SAGE guys).

Concrete suggestions to ease the progress
==========================================

Of course providing a consistent environment is a hard problem, but
hey, this is a problem many of us face. I believe that we are making
progress with many encouraging projects such as Sage, EPD, Python(x,y),
or SPD. Establishing scientific environments in Python is an ambitious
project; there will not be a one-size-fits-all solution and having many
different approaches is healthy, as long as we keep it friendly and learn
from all the efforts. I strongly believe that we will be getting more and
more satisfactory solutions in the next years.

Specifically, I would love to see an official umbrella project for
BSD-licensed tools for building scientific projects with Python. As the
"scipy" name is well branded (through the website, and the conference),
we could call this the 'scipy project'. I would personally like to limit
wheel reinvention and have preferred solutions for the various bricks (I
am thinking of the unfortunate Chaco versus Matplotlib situation, where I
have to depend on both libraries that complement each other). 

Back to the scipy foundation idea
==================================

The idea of the scipy foundation is an idea that has been floating around
for a while. If it is manned by a variety of people who express the wills
and needs of users and developers of the scipy ecosystem, it could be a
great thing. But I see two road blocks: first, as Robert points out,
telling somebody what to do will not achieve anything. I am already way
too busy scratching my own itches. Second, who will find the time to take
care of this?

And now, I have to catch up on sleep.

Gaël


More information about the Scipy-dev mailing list