[SciPy-User] Pylab - standard packages
Fri Sep 21 15:57:11 CDT 2012
On Fri, Sep 21, 2012 at 4:38 PM, Fernando Perez <firstname.lastname@example.org> wrote:
> Warning: what follows is a highly opinionated, completely biased post.
> I'll be using a 'we' that refers to the IPython developers because
> the credit for much of what I talk about goes to the whole team, but
> ultimately the rant is my responsibility, so flame me if need be.
> I think it's important to address directly the question of the IPython
> notebook. I realize that not everybody uses it, and it has some extra
> dependencies (though they are really easy ones to satisfy). But I
> also think it's an important discussion that goes to the question of
> whether we simply are trying to play catch-up what matlab/R-Rstudio
> offer, or to be truly forward-looking and rethink how scientific
> computing will be done for the coming decade. Needless to say, I have
> little interest in the former and am putting all my energy into the
> latter: if it were otherwise, I'd been contributing to Octave for the
> last 10 years instead.
> My argument, in short: we should consider *some* notebook-type tool as
> a first-class citizen of this effort, for the simple reason that such
> an approach is one whose time has come. A notebook environment is the
> only tool that truly tackles in an integrated manner the problem that
> we've been referring to as the 'lifecycle of a scientific idea'
> Context: all disciplines are becoming intensely computational, the
> need for real-time collaboration on live computational analysis is
> great, the pressures for moving towards truly reusable, reproducible
> work are coming from multiple angles (major journals, funding
> agencies, ...), we need a much smoother transition between analysis
> codes and publications, and we need better ways to share our analysis
> work over the internet, for education and for archival purposes.
> Having a good IDE is a really important point, and my hat is off to
> the stellar work the Spyder team has done (and coincidentally, another
> Colombian physicist, Carlos Córdoba, is leading the charge on the
> spyder/ipython integration work) . But to be blunt, a matlab-style
> IDE does not tackle the important questions above in any meaningful
> In the last decade's worth of the pylab world (using our new moniker
> in its intended fashion), we've certainly taken inspiration from the
> major systems out there, but it has always been that: *inspiration*,
> never simple copying:
> - John Hunter's brilliance with matplotlib was not so much to copy the
> high-level API and look/feel of plot windows to ease the transition
> from matlab. It was to rethink the question of what a plotting
> library should be, abstracting over GUI toolkits and an elegant OO
> architecture underneath the familiar scripting interface.
> - Numpy's arrays are similar to matlab/fortran ones, obviously, but
> when used with the full power of slicing, fancy indexing and
> structured dtypes, they make matlab's look like the 1970's relic they
> are. Jim Hugunin, Perry and Travis led the way to build something
> that has no match.
> - The one-man army that is Wes McKinney had R's DataFrame squarely in
> his sights when he built pandas, but he went far, far beyond the basic
> ideas in R to provide one of the most powerful packages we've seen in
> recent memory.
> - etc... you get my point.
> Now, as I said above, the scientific computing world is changing, and
> more importantly, a lot of things in the broader scientific world are
> also undergoing very drastic changes: the push for open access, data
> sharing and reproducibility of results is likely to make a lot of
> things look very different in 10 years than they do now. We can argue
> that the whole online education wave of Coursera/Udacity/EdX is a bit
> of a bubble, but there's no denying the internet will play a role in
> how scientists are trained both in and out of traditional academia.
> I argue that, after having spent the last decade building up the pylab
> foundations to be competitive with the 'big boys', we are uniquely
> well positioned to stop following and actually lead on many of these
> problems. And for that, my contention is that it is absolutely
> necessary to have:
> - A tool that bridges the gaps between exploratory work,
> collaboration, production, publication and education.
> - An open format for sharing, publishing and archiving executable
> computational work.
> - A system that is accessible through the browser, so that computation
> can be located where the data is, since we can't move the data to the
> desktop anymore. Remote collaboration also is most sensibly tackled
> via a browser, as google docs has amply demonstrated.
> Up until now I have *not* said that we should use the *IPython*
> notebook. Our efforts on this front are, I am sure, full of
> limitations and imperfections. But if we're not going to tackle the
> problems above, I would like it to be with an explicit decision on
> whether it is because:
> 1. this community only wants to stick to a traditional
> shell+editor/IDE approach.
> 2. the IPython solution is the wrong one, it has technical flaws, etc.
> If it's #1, I think it would be a huge, huge mistake and one of lack
> of foresight, ambition and vision. If that's the decision, I'm sure
> that we in the IPython team will simply continue fighting for that
> vision on our own, as we are pretty convinced it's the right thing to
> do. And evidence is mounting that others think the same too:
> - Michigan State University is teaching *two* courses on advanced
> genomics that are heavily notebook based:
> - At Berkeley we have (but this is not driven by me) both an intensive
> bootcamp and a semester-long course on scientific python with the
> - We can now blog straight off the notebook
> and Jose Unpingco is effectively writing a full book on signal
> processing as a series of blog posts that are notebooks:
> - there's more, just google it.
> Now, if the reluctance is to go with the *IPython* notebook, then I'd
> like to know what the alternative is. We have effectively put 10
> years of work into this problem, and the current implementation is the
> third or fourth attempt
> (http://blog.fperez.org/2012/01/ipython-notebook-historical.html). We
> know it's by no means perfect, but honestly I think it would be a lot
> more sensible to fix whatever our limitations are than to start yet
> once more from scratch. So by all means beat on the format, work with
> us to improve it so it meets your needs, let us know what's wrong with
> it or help us improve the tooling around it (ipython itself, the
> nbconvert tools, the nbviewer.ipython.org site, etc...). But to be
> blunt, please don't think that ignoring 10 years of work on this
> problem is the right approach.
> In summary, I think that sticking to a shell+editor/IDE view of the
> problem would be missing a huge opportunity to play a key role in
> shaping the next decade's worth of scientific computing. And by the
> way, it's not like the others are standing still here:
> - Wolfram is busy at work promoting a closed, highly proprietary idea
> - Matlab is building a solution around Microsoft Word:
> They have a huge market share and resources, so they can and will
> push pretty deep with this.
> - The R community has rapidly banded behind knitr (http://yihui.name/knitr).
> If the pylab community decides to not tackle this problem (and
> opportunity!) head-on, at least from IPython we will continue. I
> currently have 5 grants in the pipeline all of which would provide, if
> funded, some measure of support for this kind of work. We all know
> funding is a crap shoot, but even if only some of them go through we
> should have a decent amount of resources not only for our (this
> includes Brian, who's also involved with several) own time but also
> for students, postdocs and developers, to tackle this. And I simply
> view it as too important not to continue fighting in this direction.
> Now, after all this rant, I want to make clear that I'm *not* saying
> that we should stop talking about the simple shell or that everyone
> should switch to *only* using notebooks. One important property of
> the IPython notebooks is that it is very easy to generate a pure .py
> script out of any notebook, any time (and we know how to improve those
> conversion facilities quite a bit). So even if a project decides to
> ship all of its examples as notebooks, it's trivial to ensure that
> they are also accessible in pure script form to be run from the
> command line or loaded into spyder/IDLE/etc as well as converted to
> clean html in the sphinx-built documentation.
> Furthermore, the notebook is not the tool for building large-scale
> library code, so there will always be a place for
> emacs/vim/textmate/spyder, where the focus is more on the
> 'development' than the interactive exploration/analysis.
> But having notebooks in the projects, once we also build tools for
> cross-project help indexing, will let us provide users with powerful
> help that can search for a term across all the installed
> pylab-compliant tools and will give one-click access to live,
> executable examples they can modify immediately. Mathematica has had
> this for over a decade and it is absolutely extraordinary. The same
> tools can also index the pure .py versions, of course, but after 5
> years of not having a Mathematica license, I still miss this every
> time I have to trawl multiple online galleries looking for something
> in the pylab world.
> OK, I doubt anyone is reading by now, so I'll stop here... Flame away.
No argument from me. I just spend a day getting notebooks into the
statsmodels documentation and trying to improve our html repr
notebooks are a great way of getting rendered and commented code and
all of the many previous attempts where half baked.
That's for the teaching side, I don't know much about the future of
collaborative, parallel, cloud, ... interpreters. (Anaconda seems to
have it built in.)
(even if my development environment is spyder and eclipse.)
> SciPy-User mailing list
More information about the SciPy-User