[Numpy-discussion] Surprising performance tweak in Cython
Gael Varoquaux
gael.varoquaux@normalesup....
Sun Jun 22 19:37:30 CDT 2008
I tried tweak my Cython code for performance by manually inlining a small
function, and ended up with a less performant code. I must confess I
don't really understand what is going on here. If somebody has an
explaination, I'd be delighted. The code follows.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
from numpy import zeros
# Make sure numpy is initialized.
include "c_numpy.pxd"
##############################################################################
cdef int inner_loop(float c_x, float c_y):
cdef float x, y, x_buffer
x = 0; y = 0
cdef int i
for i in range(50):
x_buffer = x*x - y*y + c_x
y = 2*x*y + c_y
x = x_buffer
if (x*x + x*y > 100):
return 50 - i
def do_Mandelbrot_cython():
cdef ndarray threshold_time
threshold_time = zeros((500, 500))
cdef double *tp
cdef float c_x, c_y
cdef int i, j
c_x = -1.5
tp = <double*>threshold_time.data
for i in range(500):
c_y = -1
for j in range(500):
tp += 1
c_y += 0.004
tp[0] = inner_loop(c_x, c_y)
c_x += 0.004
return threshold_time
def do_Mandelbrot_cython2():
cdef ndarray threshold_time
threshold_time = zeros((500, 500))
cdef double *tp
tp = <double*>threshold_time.data
cdef float x, y, xbuffer, c_x, c_y
cdef int i, j, n
c_x = -1.5
for i in range(500):
c_y = -1
for j in range(500):
tp += 1
c_y += 0.004
x = 0; y = 0
for n in range(50):
x_buffer = x*x - y*y + c_x
y = 2*x*y + c_y
x = x_buffer
if (x*x + y*y > 100):
tp[0] = 50 -n
break
c_x += 0.004
return threshold_time
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
And the timing I get are:
In [2]: %timeit C.do_Mandelbrot_cython2()
10 loops, best of 3: 342 ms per loop
In [3]: %timeit C.do_Mandelbrot_cython()
10 loops, best of 3: 126 ms per loop
Cheers,
Gaël
