[SciPy-dev] scipy.weave versus simple C++.

Prabhu Ramachandran prabhu at aero.iitm.ernet.in
Fri Jan 11 09:44:27 CST 2002


I've been checking out Python, Numeric, Weave recently for serious
numeric computations.  I've started off with a very easy sample
problem that translates well to Numeric expressions.  The results are
very interesting.

I'm comparing weave.blitz with simple C++ code (no classes or virtual
functions etc, just plain arrays and structs.)  The problem is to
solve the Laplace equation on a square grid with a given boundary
condition using a four point average.  Here is what I found.

The core loop in C++ (I use double for all the computations).

    double **u = g->u; // g is a simple structure.

    for (int i=1; i<nx-1; ++i) {
        for (int j=1; j<ny-1; ++j) {
            tmp = u[i][j];
            u[i][j] = ((u[i-1][j] + u[i+1][j])*dy2 +
                       (u[i][j-1] + u[i][j+1])*dx2)*dnr_inv;
            err += SQR(u[i][j] - tmp); // SQR is a simple inline function.
    return sqrt(err);

The same core code in Python is next.  I've labelled different
sections for convenience.
        u = self.grid.u # a (500x500 NumPy array).
     # Step 1
        self.grid.old_u = u.copy() # yes, memory inefficient.

     # Step 2
        expr = "u[1:-1, 1:-1] = ((u[0:-2, 1:-1] + u[2:, 1:-1])*dy2 + "\
                "(u[1:-1,0:-2] + u[1:-1, 2:])*dx2)*dnr_inv"
        scipy.weave.blitz(expr, check_size=0)

     # Step 3
        v = (u - self.grid.old_u).flat
        return sqrt(dot(v,v))

I compiled the c++ code with the -O3 flag.  I then time the programs
for a 500x500 grid for 100 iterations.

As it stands, here is the comparison.

Weave.Blitz/Python:  10.68 s
Numeric/Python:      31.80 s
C++:                  2.71 s
Best/Ratio:          ~4.0

For other runsI get different numbers but the ratio seems set on about
4.  This is pretty darned good.  Please remember that the Python code
does the 100 iterations in pure Python along with a few while/if
conditions to check for the error and number of iterations and other
things .  Only the core loop has been optimized with weave/Numeric.

If you look at the Python code you will notice that Step 1 and Step 3
are pretty bad steps.  Just for the sake of comparison I remove Step 1
alone.  I recompute the C++ code again just to take care in case of
other processes on my system etc.

Weave.Blitz/Python:   7.0 s
C++:                  2.47 s
Ratio:                2.83

You can see some definite improvement.  Now I go further and replace
Step 3 with 'return 1.0'.  Here are the results:

Weave.Blitz/Python:   2.17 s
C++:                  2.41 s
Ratio:                0.90 s

This is an amazing result and shows why blitz++ and consequently
weave.blitz is so cool -- its faster than simple C/C++.  I also inter
changed the order of the for loops in the C code and it only gets
slower (so thisis the best possible way to do it I guess).

This shows that weave holds tremendous promise.  However, I have a few

   (1) Is there a better way to speed up the copy (step 1) and remove
   the old_u array?  Numeric is nice but not perfect.
   (2) I tried using blitz to speed up the error computation (Step 3)
   and thecopy (step 1) but no go.  I couldnt get it to work.

   (3) Whats the best way to deal with stuff that belongs in the
   innermost loop (like the error computation that I do) ?  Would
   weave.inline do the job?  I'd really like an inline example that
   does something like my C++ for loop above using Numeric or Python
   arrays.  It would be very illustrative.

All in all weave looks to be *very* promising!!  I hope it grows and
gets even better. :)

Please let me know if you want to look at my code.  Also, let me know
if I'mdoing something brain dead here.


More information about the Scipy-dev mailing list