[Numpy-discussion] Surprising performance tweak in Cython

Gael Varoquaux gael.varoquaux@normalesup....
Sun Jun 22 19:37:30 CDT 2008


I tried tweak my Cython code for performance by manually inlining a small
function, and ended up with a less performant code. I must confess I
don't really understand what is going on here. If somebody has an
explaination, I'd be delighted. The code follows.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
from numpy import zeros

# Make sure numpy is initialized.
include "c_numpy.pxd"

##############################################################################
cdef int inner_loop(float c_x, float c_y):
    cdef float x, y, x_buffer
    x = 0; y = 0
    cdef int i
    for i in range(50):
        x_buffer = x*x - y*y + c_x
        y = 2*x*y + c_y
        x = x_buffer
        if (x*x + x*y > 100):
            return 50 - i

def do_Mandelbrot_cython():
    cdef ndarray threshold_time 
    threshold_time = zeros((500, 500))
    cdef double *tp
    cdef float c_x, c_y
    cdef int i, j
    c_x = -1.5
    tp = <double*>threshold_time.data
    for i in range(500):
        c_y = -1
        for j in range(500):
            tp += 1
            c_y += 0.004
            tp[0] = inner_loop(c_x, c_y)
        c_x += 0.004
    return threshold_time


def do_Mandelbrot_cython2():
    cdef ndarray threshold_time
    threshold_time = zeros((500, 500))
    cdef double *tp
    tp = <double*>threshold_time.data
    cdef float x, y, xbuffer, c_x, c_y
    cdef int i, j, n 
    c_x = -1.5
    for i in range(500):
        c_y = -1
        for j in range(500):
            tp += 1
            c_y += 0.004
            x = 0; y = 0
            for n in range(50):
                x_buffer = x*x - y*y + c_x
                y = 2*x*y + c_y
                x = x_buffer
                if (x*x + y*y > 100):
                    tp[0] = 50 -n
                    break
        c_x += 0.004
    return threshold_time
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

And the timing I get are:

In [2]: %timeit C.do_Mandelbrot_cython2()
10 loops, best of 3: 342 ms per loop

In [3]: %timeit C.do_Mandelbrot_cython()
10 loops, best of 3: 126 ms per loop

Cheers,

Gaël


More information about the Numpy-discussion mailing list