# [SciPy-user] Numpy/SciPy and performance optimizations

Georg Holzmann grh@mur...
Wed Jan 7 06:52:15 CST 2009

```Hallo!

In the last days I went through some tutorials about python and
performance optimizations, basically the two main articles I looked
through are (and the references in there):
- http://www.scipy.org/PerformancePython
- http://wiki.cython.org/tutorials/numpy

So it seems that there exist now many possibilities to speed up some
essential parts of the python code, however, I am still not satisfied
with those solutions.

My problem:

I have parts in my projects, where I have to iterate over loops (some
recursive algorithms). In the past I developed the basic library in C++
(using SWIG to generate python modules) - but now I want to switch fully
to python and only optimize some small parts, because I waste too much
time while trying to extend the C++ library, which is already quite
complex ...

Okay, of course weave in combination with blitz looked very attractive
to me.
After struggling through the documentation of weave and blitz++, I
understood the concept and tried to implement an example.
One example of such a typical loop would be (all variables are arrays,
from numpy import *):

for n in range(steps):
x = dot(A, x)
x += dot(B, u[:, n])
x = tanh(x)
y[:,n] = dot(C, r_[x,u[:,n]] )

So I need in blitz++ some matrix-vector multiplications and similar
stuff, which is unfortunately not very intuitive.
One way is to use the blitz::sum function, which is IMHO not intuitive
and very slow, slower than usual numpy (see for instance also some
benchmark of C/C++ libraries I made last year:
http://grh.mur.at/misc/sparselib_benchmark/index.html).
Another way would be to use blas and write support code for every needed
blas (or maybe also lapack) function - as for instance demonstrated in
http://www.math.washington.edu/~jkantor/Numerical_Sage/node14.html.
However, this was now too much work for me ...

What I want:

- easy embeddable C/C++ code, without having to handle a complicated
python API (like in weave)
- basic matrix operations (blas, maybe also lapack) available in C/C++
- nice indexing, slicing etc. also in C/C++ (which is nice with blitz++)
- handling of sparse matrices also in C/C++ (at least basic blas methods
for sparse matrices)

OK, this is quite a big wishlist ;)
However, ATM I can think of two possible solutions:

the box possible to have at least blas functions available

2. Writing a new type converter for weave, which supports a more feature
rich (and faster) C++ library than blitz++

I don't know how hard 2. would be ?
At least I played with quite some C++ libraries last year (see again the
benchmark http://grh.mur.at/misc/sparselib_benchmark/index.html) and
there would be three nice candidates:
- MTL: http://www.osl.iu.edu/research/mtl/
- gmm++: http://home.gna.org/getfem/gmm_intro
- flens: http://flens.sourceforge.net/
(- maybe also boost ublas:
http://grh.mur.at/misc/sparselib_benchmark/www.boost.org/libs/numeric/)

These three libraries are very fast, header only libs (like blitz++) and
also have blas, lapack and sparse support.
compared to Intel BLAS, blitz, fortran, c:

So, it would be nice to get some feedback, maybe there are other
solutions I don't know of ?
(Maybe it is easier to do all this in fortran and use f2py ?)
How do other people optimize more complicated code ?

I would be also happy to get some remarks, if it is useful to implement
type converters for an other C++ library than blitz++ (e.g. MTL or
gmm++) - and maybe some suggestions for that ...

Thanks for any hints,
LG
Georg
```