[Numpy-discussion] gfortran/g77+f2py vs gcc+Cython speed comparison

Ondrej Certik ondrej@certik...
Sun Dec 23 06:57:32 CST 2007

```Hi,

I needed to write 2D Ising model simulation into my school and I
decided to compare the two possible solutions how to do it, so I of
course wrote
it in Python, then rewrote it in Fortran + f2py, and also Cython. What
is better? Read below. :)  But for the impatient, I am going to use
Cython, reasons below.

CCing to Cython, numpy (f2py is discussed there), and sage-devel
(there are people there who could be interested in these kinds of
comparisons).

The code is available at:

http://hg.sharesource.org/isingmodel/

How to play with that - just do this (after installing Mercurial):

\$ hg clone http://hg.sharesource.org/isingmodel/
[...]
\$ cd isingmodel
\$ hg up db7dd01cdc26                                 # just to be sure
that we are talking about the same revision / code
\$ make
[...]
\$ time python simulate.py
[...]
real    0m2.026s
user    0m1.988s
sys     0m0.020s

This runs Cython code. Then apply this patch to run fortran code instead:

\$ hg di
diff -r db7dd01cdc26 simulate.py
--- a/simulate.py       Sun Dec 23 02:23:30 2007 +0100
+++ b/simulate.py       Sun Dec 23 02:24:33 2007 +0100
@@ -31,8 +31,8 @@ def MC(mu = 1, temp = 2, dim = 20, steps
J=1 #coupling constant
k=1 #Boltzman constant

-    #from mcising import mc
-    from pyising import mc
+    from mcising import mc
+    #from pyising import mc
B = D1(A)
mc(B, dim, steps, temp, H, mu, J, k)
return D2(B)

And then again (and apply the patch below, otherwise it might not work for you):

\$ time python simulate.py
[...]
real    0m3.600s
user    0m3.528s
sys     0m0.052s

So it's a lot slower.

We are comparing many things here - wrappers, my fortran coding skills
vs Cython C code generation and gcc vs gfortran. So I wrote to numpy
mailinglist. First Travis (the author of numpy) suggested:

"
My experience with similar kinds of comparisons is that gnu fortran
compilers are not very good, especially on 2-d problems. Try using a
different fortran compiler to see if speeds improve.
"

Then Pearu (the author of f2py) suggested:

"
Though the problem is 2D, your implementations are essentially
1D. If you would treat the array A as 2D array (and avoid calling
subroutine p) then you would gain some 7% speed up in Fortran.

When using -DF2PY_REPORT_ATEXIT for f2py then a summary
of timings will be printed out about how much time was spent
in Fortran code and how much in the interface. In the given case
I get (nsteps=50000):

Overall time spent in ...
(a) wrapped (Fortran/C) functions           :     1962 msec
(b) f2py interface,               60 calls  :        0 msec
(c) call-back (Python) functions            :        0 msec
(d) f2py call-back interface,      0 calls  :        0 msec
(e) wrapped (Fortran/C) functions (acctual) :     1962 msec

that is, most of the time is spent in Fortran function and no time
in wrapper. The conclusion is that the cause of the
difference in timings is not in f2py or cpython generated
interfaces but in Fortran and C codes and/or compilers.

Some idiom used in Fortran code is just slower than in C..
For example, in C code you are doing calculations using
float precision but in Fortran you are forcing double precision.

HTH,
Pearu

PS: Here follows a setup.py file that I used to build the
extension modules instead of the Makefile:

#file: setup.py
def configuration(parent_package='',top_path=None):
from numpy.distutils.misc_util import Configuration
config = Configuration('',parent_package,top_path)
define_macros = [('F2PY_REPORT_ATEXIT',1)]
)
return config
from numpy.distutils.core import setup
setup(configuration = configuration)
"

"
When using g77 compiler instead of gfortran, I get a speed
up 4.8 times.

Btw, a line in a if statement of the fortran code
should read `A(p(i,j,N)) = - A(p(i,j,N))`.
"

(btw I have no idea how it could work for me without the A(p(i,j,N)) =
- A(p(i,j,N)) fix, quite embarassing)

So then we discussed it on #sympy IRC:

* Now talking on #sympy
<ondrej> hi pearu, thanks a lot for testing it!
<ondrej> 4.8 speedup, jesus christ. so the gfortran sucks a lot
<pearu> hey ondrej
* ondrej is trying g77
<pearu> gortran has advantage of being Fortran 90 compiler
<ondrej> g77 is depricated in Debian and dead upstram (imho)
<pearu> yes but I guess those guys who are maintaing g77/gfortran,
those do not crunch numbers much;)
<pearu> g77 has a long development history and is well optimized
<ondrej> I fear the fortran is a bad investment for new projects
<ondrej> I know g77 is well optimized, but it's dead
<pearu> gfortran is a write from scratch and is too young
<ondrej> do you think many people still write fortran? I use it just
because g77 was faster than gcc
<pearu> g77 is certainly not dead, scientist use it a lot because of its speed
<ondrej> btw`A(p(i,j,N)) = - A(p(i,j,N))`.
<ondrej> means my fortran code was broken
<ondrej> doesn't it?
<pearu> some think that fortran is a dead language, some use it a lot
because lots of code is written in fortran over several decades
<ondrej> yes
<pearu> yes, it was, I got segfaults because of that
<ondrej> I don't know what to think myself
<pearu> it depends on application
<pearu> if it is a research app then use fortran because of speed
<ondrej> but as you can see, you need to use g77
<ondrej> and not gfortran
<ondrej> (but all Debian is moving to use gfortran)
<pearu> I use gfortran only for f90 code
<ondrej> hm
<pearu> I think Debian is wrong in short term, may be in future
gfortran will be faster, but not at the moment
<ondrej> but is someone developing g77?
<pearu> no, I don't think so
<ondrej> debian moved to gfortran, because there are problems with g77
being umaintained
<pearu> but it is a complete Fortran 77 compiler and a very good one
<ondrej> hm. I think when one wants speed, one should use some
comercial compilers
<pearu> probably because new architectures are developed and g77 is
not updated to use their features
<ondrej> and when one wants robustness, it should use a free compiler
that is maintinaed (gfortran)
<pearu> commercial compilers does not mean fast compliers, in general
<ondrej> I didn't try intel, so I don't know how it compares to g77
<pearu> using intel restricts one to be on Intel platform
<ondrej> the g77 doesn't support -fdefault-real-8 for example
<pearu> AMD is daster in floating point that most scientist use
<pearu> why is this flag important, one can always write real*8
<ondrej> indeed, g77 is 1.8s and gcc 2.0s on my comp
<ondrej> as to real*8 - I thought a good strategy is to write real and
let the compiler choose the precision
<pearu> try to use float in fortran as you do in C
<ondrej> ok
<pearu> this will mess up f2py-ing as f2py assumes that real is real*4
<ondrej> yes -
<ondrej> real*4 is float, isn't it?
<pearu> yes
<ondrej> so I'll just use real*4 everywhere
<pearu> yep
<ondrej> ok
<ondrej> btw, I was writing a python script for automatically
converting real*4 to real*8 and back
<ondrej> but it's freaking hard to parse fortran
<ondrej> (I needed to also work with "real", etc)
<ondrej> I only managed it to work in simple cases
<pearu> just do re.sub
<ondrej> it's not that easy
<ondrej> there are constructs like real(something)
<ondrej> that shouldn't be converted
<ondrej> but real T should be converted
<ondrej> etc.
<ondrej> so using real*4 with g77 slowed the code done from 1.8s to 1.9s
<pearu> re.sub should be able to handle these, use cb argument, fo instance
<ondrej> ok
<pearu> you are using random numbers, could this make also a difference
<ondrej> I think it does
<ondrej> (I had to use glibc fast random to speed C up)
<pearu> a, ok, that also explains things
<ondrej> shit, the gfortran is really slow
<ondrej> 3.5s on my comp
<ondrej> no matter which precision
<pearu> try to disable random to see if loops are faster in C or fortran
<ondrej> ok
<ondrej> so gfortran is now 1.3s
<ondrej> and g77 1.04s
* ondrej is trying C
<ondrej> C is 1.05s
<pearu> so gfortran random is slow
<ondrej> yes
<ondrej> g77 random is fast
<ondrej> C random is slower than g77
<ondrej> it depends on the quality of the random generator as well
<ondrej> there are huge differences in that
<pearu> and so gfortran itsel is not too bad
<pearu> :)
<ondrej> but as you can see - there is no point for me to use g77 if I
can achieve the same speed with C
<ondrej> and gfortran is slower than both
<pearu> yep
<ondrej> ok, going to write this to the list
<pearu> can you try ifc
<ondrej> I'll also send it to the Cython guys, they'll like the result. :)
<ondrej> ifc = intel fortran compiler?
<pearu> yes
<ondrej> I like opensource and Debian
<ondrej> I think this clearly shows, that fortran is dead for me. I
think gcc is pretty good too.
<pearu> note that these results are not thanks to pyrex /cpython but
thanks to compilers:)
<pearu> i like to be fair;)
<ondrej> (yes, it's just compilers comparison, not f2py vs Cython/pyrex)
<pearu> yep
<ondrej> but there are two ways a user can choose - either cython+gcc,
or f2py+fortran
<pearu> next time we shoul probably have this discussion inscipy irc
<ondrej> and if he wants to use the most recent default free
compilers, at least I will choose cython+gcc
<pearu> there are more ways, weave, etc etc
<ondrej> I am going to paste this to my email about that
<pearu> ok
<ondrej> (if you are ok with that)
<pearu> ok with me

So, what do you think of that?

Ondrej
```