[SciPy-user] python (against java) advocacy for scientific projects
Ravi
lists_ravi@lavabit....
Mon Jan 19 13:58:19 CST 2009
Hi,
The advice from Mr. Molden is well-argued, but he does gloss over a few of
the difficulties. These serious problems are also present in Matlab & Java for
the most part.
1. The python packaging system is junk. Matlab & Java get around this problem
by not really having a packing system (leading to even worse confusion). PyPi
& setuptools are painful (search enthought-dev & ipython-dev lists for the
last year, especially for posts from Fernando Perez & Gael Varoquaux, for more
information).
2. Installation/compilation of C/C++ extensions/wrappers: Matlab's cohesive
toolbox shines here; their method is clearly documented and works reasonably
well across all the platforms they support (at least on Solaris, HPUX, Linux &
Windows, the plaforms I work with). Java extensions are, IMHO, reasonably
straightforward to maintain, but python distutils takes everything to a whole
new level of nightmare. For distutils difficulties, simply search the archives
for this mailing list (especially those from David Cournepeau).
3. The lack of a real JIT compiler is a serious issue if the use cases involve
more than linear algebra and differential equation solvers. In many such
cases, for-loops and/or while-loops are the only reasonable solutions, both of
which, very often, execute much faster under Matlab or Java. Some operations
are simply not vectorizable if you wish to have maintainable code, e.g., large
groups of interacting state machines.
4. Both Java & Matlab have very well-thought out IDEs. As I don't use IDEs
myself, I cannot comment on their ease of use, but my colleagues who do work
with them find them extremely useful. Neither eclipse-pydev nor eric3 is
anywhere close to the Matlab IDE workspace. Java has several very nice IDEs
but none of them are as useful as the Matlab IDE. A related issue is the lack
of a decent debugger; pydb+ipython is the best one I have come across for
python, but they are nowhere near Matlab/Java offerings.
In spite of the issues highlighted above, Python is still the best choice,
beacuse of the large library and because of the well-designed language
specification. (Cpython's shortcomings are well-known and will eventually be
addressed by PyPy and the like; in some computation-intensive cases, even
IronPython beats out cpython, go figure.)
Mr. Molden has provided a very good summary of the Python workflow but there
is one issue that keeps rearing its ugly head on the numpy/scipy lists over &
over again:
On Monday 19 January 2009 11:14:28 Sturla Molden wrote:
> 9. If the bottleneck cannot be solved by libraries or changing
> algorithm, re-write these parts in Fortran 95. Compile with f2py to get
> a Python callable extension module. Real scientists do not use C++ (if
> we need OOP, we have Python.)
I completely agree with the first part of the point above. (Use Fortran95 or
many of the other languages which have very good numerical performance to
speed up bottlenecks). However, the last part is merely ugly prejudice. Like
python, Fortran, and other languages, C++ does have its place in scientific
computing. Here's one example which, in my experience, is completely
impossible to do in python, Matlab, Java or even C:
The bottleneck in one our simulations is a fixed point FFT computation
followed by a modified gradient search. Try implementing serious fixed-point
computation with, say, 13-bit numbers, some of which are optimally expressed
in log-normal form and the others in the standard form, on
python/Matlab/Java/C. You will end up with either unmaintainable code or
unusably slow code. C++ templates & a little bit of metaprogramming make
prototyping the algorithm easy (because you can use doubles to verify data
flow) while simultaneously making it easy to enhance the prototype quickly
into fixed point code (simply by replacing types and running some automated
tests to find appropriate bit-widths). In our case, we needed to optimize the
radix of the underlying FFTs as well because of some high throughput
considerations.
Admittedly, the problem considered above is pretty difficult & pretty
specialized, but the beauty of C++ or even of PL/1 is that it makes certain
difficult problems tractable: problems which are practically impossible to
solve with python/Java/Matlab/C. Leave your programming language prejudices at
home when you consider afresh the optimal solutions to your problem.
Regards,
Ravi
More information about the SciPy-user
mailing list