[SciPy-user] python (against java) advocacy for scientific projects

Ravi lists_ravi@lavabit....
Mon Jan 19 13:58:19 CST 2009


Hi,
  The advice from Mr. Molden is well-argued, but he does gloss over a few of 
the difficulties. These serious problems are also present in Matlab & Java for 
the most part.

1. The python packaging system is junk. Matlab & Java get around this problem 
by not really having a packing system (leading to even worse confusion). PyPi 
& setuptools are painful (search enthought-dev & ipython-dev lists for the 
last year, especially for posts from Fernando Perez & Gael Varoquaux, for more 
information).

2. Installation/compilation of C/C++ extensions/wrappers: Matlab's cohesive 
toolbox shines here; their method is clearly documented and works reasonably 
well across all the platforms they support (at least on Solaris, HPUX, Linux & 
Windows, the plaforms I work with). Java extensions are, IMHO, reasonably 
straightforward to maintain, but python distutils takes everything to a whole 
new level of nightmare. For distutils difficulties, simply search the archives 
for this mailing list (especially those from David Cournepeau).

3. The lack of a real JIT compiler is a serious issue if the use cases involve 
more than linear algebra and differential equation solvers. In many such 
cases, for-loops and/or while-loops are the only reasonable solutions, both of 
which, very often, execute much faster under Matlab or Java. Some operations 
are simply not vectorizable if you wish to have maintainable code, e.g., large 
groups of interacting state machines.

4. Both Java & Matlab have very well-thought out IDEs. As I don't use IDEs 
myself, I cannot comment on their ease of use, but my colleagues who do work 
with them find them extremely useful. Neither eclipse-pydev nor eric3 is 
anywhere close to the Matlab IDE workspace. Java has several very nice IDEs 
but none of them are as useful as the Matlab IDE. A related issue is the lack 
of a decent debugger; pydb+ipython is the best one I have come across for 
python, but they are nowhere near Matlab/Java offerings.

In spite of the issues highlighted above, Python is still the best choice, 
beacuse of the large library and because of the well-designed language 
specification. (Cpython's shortcomings are well-known and will eventually be 
addressed by PyPy and the like; in some computation-intensive cases, even 
IronPython beats out cpython, go figure.)

Mr. Molden has provided a very good summary of the Python workflow but there 
is one issue that keeps rearing its ugly head on the numpy/scipy lists over & 
over again:

On Monday 19 January 2009 11:14:28 Sturla Molden wrote:
> 9. If the bottleneck cannot be solved by libraries or changing
> algorithm, re-write these parts in Fortran 95. Compile with f2py to get
> a Python callable extension module. Real scientists do not use C++ (if
> we need OOP, we have Python.)

I completely agree with the first part of the point above. (Use Fortran95 or 
many of the other languages which have very good numerical performance to 
speed up bottlenecks). However, the last part is merely ugly prejudice. Like 
python, Fortran, and other languages, C++ does have its place in scientific 
computing. Here's one example which, in my experience, is completely 
impossible to do in python, Matlab, Java or even C:
  The bottleneck in one our simulations is a fixed point FFT computation 
followed by a modified gradient search. Try implementing serious fixed-point 
computation with, say, 13-bit numbers, some of which are optimally expressed 
in log-normal form and the others in the standard form, on 
python/Matlab/Java/C. You will end up with either unmaintainable code or 
unusably slow code. C++ templates & a little bit of metaprogramming make 
prototyping the algorithm easy (because you can use doubles to verify data 
flow) while simultaneously making it easy to enhance the prototype quickly 
into fixed point code (simply by replacing types and running some automated 
tests to find appropriate bit-widths). In our case, we needed to optimize the 
radix of the underlying FFTs as well because of some high throughput 
considerations.

Admittedly, the problem considered above is pretty difficult & pretty 
specialized, but the beauty of C++ or even of PL/1 is that it makes certain 
difficult problems tractable: problems which are practically impossible to 
solve with python/Java/Matlab/C. Leave your programming language prejudices at 
home when you consider afresh the optimal solutions to your problem.

Regards,
Ravi




More information about the SciPy-user mailing list