[SciPy-User] Pylab - standard packages

Fernando Perez fperez.net@gmail....
Fri Sep 21 15:38:50 CDT 2012


Warning: what follows is a highly opinionated, completely biased post.
 I'll be using a 'we' that refers to the IPython developers because
the credit for much of what I talk about goes to the whole team, but
ultimately the rant is my responsibility, so flame me if need be.

self.put_hat(kind='IPython').

I think it's important to address directly the question of the IPython
notebook.  I realize that not everybody uses it, and it has some extra
dependencies (though they are really easy ones to satisfy).  But I
also think it's an important discussion that goes to the question of
whether we simply are trying to play catch-up what matlab/R-Rstudio
offer, or to be truly forward-looking and rethink how scientific
computing will be done for the coming decade.  Needless to say, I have
little interest in the former and am putting all my energy into the
latter: if it were otherwise, I'd been contributing to Octave for the
last 10 years instead.

My argument, in short: we should consider *some* notebook-type tool as
a first-class citizen of this effort, for the simple reason that such
an approach is one whose time has come.  A notebook environment is the
only tool that truly tackles in an integrated manner the problem that
we've been referring to as the 'lifecycle of a scientific idea'
(https://speakerdeck.com/u/fperez/p/ipython-tools-for-the-lifecycle-of-research-computing?slide=3).

Context: all disciplines are becoming intensely computational, the
need for real-time collaboration on live computational analysis is
great, the pressures for moving towards truly reusable, reproducible
work are coming from multiple angles (major journals, funding
agencies, ...), we need a much smoother transition between analysis
codes and publications, and we need better ways to share our analysis
work over the internet, for education and for archival purposes.
Having a good IDE is a really important point, and my hat is off to
the stellar work the Spyder team has done (and coincidentally, another
Colombian physicist, Carlos Córdoba, is leading the charge on the
spyder/ipython integration work) .  But to be blunt, a matlab-style
IDE does not tackle the important questions above in any meaningful
way.

In the last decade's worth of the pylab world (using our new moniker
in its intended fashion), we've certainly taken inspiration from the
major systems out there, but it has always been that: *inspiration*,
never simple copying:

- John Hunter's brilliance with matplotlib was not so much to copy the
high-level API and look/feel of plot windows to ease the transition
from matlab.  It was to rethink the question of what a plotting
library should be, abstracting over GUI toolkits and an elegant OO
architecture underneath the familiar scripting interface.

- Numpy's arrays are similar to matlab/fortran ones, obviously, but
when used with the full power of slicing, fancy indexing and
structured dtypes, they make matlab's look like the 1970's relic they
are.  Jim Hugunin, Perry and Travis led the way to build something
that has no match.

- The one-man army that is Wes McKinney had R's DataFrame squarely in
his sights when he built pandas, but he went far, far beyond the basic
ideas in R to provide one of the most powerful packages we've seen in
recent memory.

- etc... you get my point.


Now, as I said above, the scientific computing world is changing, and
more importantly, a lot of things in the broader scientific world are
also undergoing very drastic changes: the push for open access, data
sharing and reproducibility of results is likely to make a lot of
things look very different in 10 years than they do now.  We can argue
that the whole online education wave of Coursera/Udacity/EdX is a bit
of a bubble, but there's no denying the internet will play a role in
how scientists are trained both in and out of traditional academia.

I argue that, after having spent the last decade building up the pylab
foundations to be competitive with the 'big boys', we are uniquely
well positioned to stop following and actually lead on many of these
problems.  And for that, my contention is that it is absolutely
necessary to have:

- A tool that bridges the gaps between exploratory work,
collaboration, production, publication and education.

- An open format for sharing, publishing and archiving executable
computational work.

- A system that is accessible through the browser, so that computation
can be located where the data is, since we can't move the data to the
desktop anymore.  Remote collaboration also is most sensibly tackled
via a browser, as google docs has amply demonstrated.


Up until now I have *not* said that we should use the *IPython*
notebook.  Our efforts on this front are, I am sure, full of
limitations and imperfections.  But if we're not going to tackle the
problems above, I would like it to be with an explicit decision on
whether it is because:

1. this community only wants to stick to a traditional
shell+editor/IDE approach.

2. the IPython solution is the wrong one, it has technical flaws, etc.

If it's #1, I think it would be a huge, huge mistake and one of lack
of foresight, ambition and vision.  If that's the decision, I'm sure
that we in the IPython team will simply continue fighting for that
vision on our own, as we are pretty convinced it's the right thing to
do.  And evidence is mounting that others think the same too:

- Michigan State University is teaching *two* courses on advanced
genomics that are heavily notebook based:
http://ged.msu.edu/angus/beacon-2012/index.html,
https://github.com/ngs-docs/ngs-notebooks.

- At Berkeley we have (but this is not driven by me) both an intensive
bootcamp and a semester-long course on scientific python with the
same:
https://github.com/profjsb/python-bootcamp,
https://github.com/profjsb/python-seminar.

- We can now blog straight off the notebook
(http://blog.fperez.org/2012/09/blogging-with-ipython-notebook.html),
and Jose Unpingco is effectively writing a full book on signal
processing as a series of blog posts that are notebooks:
http://python-for-signal-processing.blogspot.com.

- there's more, just google it.

Now, if the reluctance is to go with the *IPython* notebook, then I'd
like to know what the alternative is.  We have effectively put 10
years of work into this problem, and the current implementation is the
third or fourth attempt
(http://blog.fperez.org/2012/01/ipython-notebook-historical.html).  We
know it's by no means perfect, but honestly I think it would be a lot
more sensible to fix whatever our limitations are than to start yet
once more from scratch.  So by all means beat on the format, work with
us to improve it so it meets your needs, let us know what's wrong with
it or help us improve the tooling around it (ipython itself, the
nbconvert tools, the nbviewer.ipython.org site, etc...).  But to be
blunt, please don't think that ignoring 10 years of work on this
problem is the right approach.


In summary, I think that sticking to a shell+editor/IDE view of the
problem would be missing a huge opportunity to play a key role in
shaping the next decade's worth of scientific computing. And by the
way, it's not like the others are standing still here:

-  Wolfram is busy at work promoting a closed, highly proprietary idea
(http://www.wolfram.com/cdf-player).

- Matlab is building a solution around Microsoft Word:
http://www.mathworks.com/help/matlab/matlab_prog/create-a-matlab-notebook-with-microsoft-word.html.
 They have a huge market share and resources, so they can and will
push pretty deep with this.

- The R community has rapidly banded behind knitr (http://yihui.name/knitr).


If the pylab community decides to not tackle this problem (and
opportunity!) head-on, at least from IPython we will continue.  I
currently have 5 grants in the pipeline all of which would provide, if
funded, some measure of support for this kind of work.  We all know
funding is a crap shoot, but even if only some of them go through we
should have a decent amount of resources not only for our (this
includes Brian, who's also involved with several) own time but also
for students, postdocs and developers, to tackle this.  And I simply
view it as too important not to continue fighting in this direction.


Now, after all this rant, I want to make clear that I'm *not* saying
that we should stop talking about the simple shell or that everyone
should switch to *only* using notebooks.  One important property of
the IPython notebooks is that it is very easy to generate a  pure .py
script out of any notebook, any time (and we know how to improve those
conversion facilities quite a bit).  So even if a project decides to
ship all of its examples as notebooks, it's trivial to ensure that
they are also accessible in pure script form to be run from the
command line or loaded into spyder/IDLE/etc as well as converted to
clean html in the sphinx-built documentation.

Furthermore, the notebook is not the tool for building large-scale
library code, so there will always be a place for
emacs/vim/textmate/spyder, where the focus is more on the
'development' than the interactive exploration/analysis.

But having notebooks in the projects, once we also build tools for
cross-project help indexing, will let us provide users with powerful
help that can search for a term across all the installed
pylab-compliant tools and will give one-click access to live,
executable examples they can modify immediately.  Mathematica has had
this for over a decade and it is absolutely extraordinary.  The same
tools can also index the pure .py versions, of course, but after 5
years of not having a Mathematica license, I still miss this every
time I have to trawl multiple online galleries looking for something
in the pylab world.


OK, I doubt anyone is reading by now, so I'll stop here...  Flame away.

f


More information about the SciPy-User mailing list