[SciPy-user] Pros and Cons of Python verses other array environments

John Hunter jdhunter at ace.bsd.uchicago.edu
Thu Sep 28 21:48:22 CDT 2006


>>>>> "Rob" == Rob Hetland <hetland at tamu.edu> writes:

    Rob> All of the arguments made *for* PyLab are true -- you think
    Rob> so too, or you wouldn't be reading this.  I have been a huge
    Rob> proponent of PyLab, and have taught seminars on it here at
    Rob> Texas A&M and Woods Hole to people who primarily use MATLAB.
    Rob> I have heard a number of objections or excuses that it all
    Rob> looks good, but.....  - it's hard to install - I already know
    Rob> how to use MATLAB, and it works fine for me - when do I find
    Rob> a week (or month or semester) to learn a new programing
    Rob> language - I already have so many m-files that I would need
    Rob> to rewrite

The first thing this thread makes me think is: why does wikipedia work
but wikis for scientific python not.  If we followed Travis' lead and
aggregated the collective wisdom on this thread into the wiki page, we
would have something enduring for the masses.  As it is, only geeks
like us who read mailing lists or archives will benefit from it.
Maybe this points to the problem: the primary users and developers of
scientific computing in python are sufficiently technologically
literate that they not only overcome the additional complexity, they
need it and crave it.

I was a huge matlab user for almost a decade; I tried to write a book
about matlab (see http://matplotlib.sf.net/matlab_cookbook.pdf,
unfortunately as incomplete as the mpl cookbook and other
documentation).  At some point I "hit the wall" and could no longer be
productive in matlab.  The extra overhead of managing complex data
structures, developing complex GUIs, and working with networked data
and databases was consuming most of my programming energy.  Yes,
matlab provides you a simple, comprehensive interface, and a fairly
complete set of numerical libs, but when you want to work with complex
data in a realistic networked environment, you hit the limits of the
language and environment pretty hard.  Then you rewrite what you like
about matlab in python and get on with it.

matlab is a great tool for beginners and intermediates.  For experts,
it has limitations which are hard to overcome. My advice to students:
if you aspire to be an expert, bite the bullet now and build a set of
tools that can scale with you on your ascent.  Also, realize that The
Mathworks is like the crack dealer on the street: the first hit is
free; once you are addicted it becomes quite expensive.  An academic
license or a student version sells for under $100.  If you are a
business and need the important toolkits, you are looking at 50K per
year.  If you are an entrepreneurial student and dream of starting
your own business once you graduate, ask yourself what you could do
with the extra cash saved from a single site license.  If your
fledgling business grows, ask yourself what you can do with the cash
saved from 50 site licenses (hint, that is 2.5 million dollars a
year).  If you are ready to spend the 2.5 million dollars, fine, but
first try the following exercises in matlab and python

  * download and parse a CSV file from a web server, eg
    http://ichart.finance.yahoo.com/table.csv?s=INTC&d=8&e=29&f=2006&g=d&a=6&b=9&c=1986&ignore=.csv
    (for a python implementation, see the matplotlib.finance module)

  * fill out a web CGI form in matlab (hint: you can do it with the
    embedded JVM, a virtual machine running in a virtual machine)

  * query a mysql database on linux, win32, and OS X with the same
    script and populate an array with the results

Now how much would you pay?

PS: it's been a while since I looked at that matlab cookbook I was
working on.  I find the following sections of the matlab PDF linked
above fun in a historical light::

Alternatives to matlab

  I am a devotee of open source software.  I (almost exclusively) use
  linux as an operating system, emacs as an integrated development
  environment, python for small and large scale programming, C++ for
  numerics, and so on.  Matlab is the only commercial piece of software I
  use regularly.
  
  I really don't want to use it, mainly because it is so expensive.  I
  work in an academic environment, where site licenses go for the
  incredibly cheap price of $75 per year, toolboxes included.  Check
  out the commercial price list to get an idea of just how expensive
  it is outside of academia.  I'll give you a hint.  About as much as
  a new Lexus sport utility vehicle.
  
  So aside from my support for GNU and linux and open source software, I
  don't want to wake up some day outside the folds of academia having to
  pay for matlab.  Every day I use matlab is another set of plotting and
  analysis functions that I come to rely on, which makes it increasingly
  hard to go cold turkey. Every once in a while I make an aborted attempt
  to give it up (I know it's not good for me) but I always find
  myself coming back.  The main reason is the graphics -- the ease with
  which I can make publication quality figures that I just haven't found
  in competing, open source, free as in Richard Stallman
  (http://www.gnu.org/philosophy/free-sw.html), solutions.

Free alternatives
  
    * python -- python is the one true language.  I have written
    extensively in perl, C++, FORTRAN, BASIC, and yes matlab, and in
    python I have found the one true language.  I say that with tongue
    in cheek -- there is no one true language, because the strengths
    of a language often imply its weaknesses.  The classic trade offs
    between user friendliness and power, expressiveness and
    readability, development time and execution time.  python solves
    all these problems for me because it is so clear syntactically,
    has so many great libraries built in, and so many great external
    libraries.  In the final category, relevant to this discussion, is
    numpy (http://www.pfdubois.com/numpy) and its recent
    successor scipy (http://www.scipy.org).
    
    These libraries provide efficient C/C++/FORTRAN libraries, all wrapped
    in python, that give you a huge array of highly tested, optimized,
    numerical libraries, for free.  And you can read and modify the source
    code at will, in large part obviating the classic problem of closed
    source (matlab) libraries.  That in a few years, when another platform
    is dominant, your solution of today is no longer supported.  With open
    source, your solution is supported as long as users continue to use it
    and support it.  SGI was the proprietary platform of choice for high
    performance graphics software 5 years ago.  Today, support and
    maintenance have become increasingly difficult and expensive.
    
    And while numerous graphics packages for scipy exist, none compare to
    the breadth, ease of use, generality and quality of the matlab
    libraries.  Yet.  As a general rule, open source solutions follow
    excellent close source solutions with a short time lag.  Witness the
    gimp, an excellent drop in replacement for Photoshop).  So keep your eye
    on python for standardized, excellent graphics solutions in the near
    future.
    
    If you want to split the difference, python does support an
    interface to matlab called pymat
    (http://claymore.engineer.gvsu.edu/~steriana/Python/pymat.html),
    so you can do your number crunching in numpy, and pass the results
    off to matlab for plotting, thus minimizing your dependence on
    matlab until the final step of producing graphical output.
    
    
   * octave (http://www.octave.org) Octave is an open source clone of
    matlab.  Many m-files will run in octave without changes.  But
    when you start to make plots, you'll hit incompatibilities.
    octave uses gnuplot for plotting, and the support, particularly
    for handle graphics, is limited, as is the quality of the graphics
    produced.


JDH


More information about the SciPy-user mailing list