[Numpy-discussion] my cython is slow

John Hunter jdh2358@gmail....
Thu Jan 8 13:56:02 CST 2009

On Thu, Jan 8, 2009 at 12:34 PM, Eric Firing <efiring@hawaii.edu> wrote:
> John Hunter wrote:
>> On Wed, Jan 7, 2009 at 5:37 PM, Eric Firing <efiring@hawaii.edu> wrote:
>>> A couple small changes speed it up quite a bit:
>>> efiring@manini:~/temp/nnbf$   python test_nnbf.py
>>> loading data... this could take a while
>>> testing nnbf...
>>>    10 trials: mean=0.0150, min=0.0100
>>> testing numpy...
>>>    10 trials: mean=0.0660, min=0.0600
>>> It is all a matter of keeping Python objects and function calls out of inner
>>> loops.  I suspect there is quite a bit more that could be done in that
>>> regard, but I haven't looked.
>> Much  faster, but no longer correct, as you'll see if you uncomment
>> out the nose test test_neighbors that compare actual vs desired.
>> Is the pointer arithmetic correct:
>>   dataptr + i
>> I would have thought perhaps:
>>   dataptr + i*n
>> but this is segfaulting.  Do we need to use a stride?
> Sorry, I was too hasty.  Yes, it seems like i*n should be correct, but
> it isn't; we are missing something simple and fundamental here.  I don't
> see it immediately, and won't be able to look at it for a while.

OK, the code at

  > svn co https://matplotlib.svn.sourceforge.net/svnroot/matplotlib/trunk/py4science/examples/pyrex/nnbf

now passes the correctness tests and is approx ten time faster than
the numpy version.

I borrowed the "raw_data" idiom from Anne's  ckdtree.pyx, though I
don't really understand why it is different that what we were doing
with the data.data ptr.  I am also not sure if I need to be doing any
memory management when I resize the buffer and reset raw_data....

A brute force nearest neighbor routine with incremental add.  The
internal array data structure grows as you add points

import numpy as np
cimport numpy as np

cdef extern from "math.h":
     float sqrt(float)

cdef inline int is_neighbor(int n, double*row, double*pp, double d2max):
    return 1 if the sum-of-squares of n length array row[j]-pp[j] <= d2max
    cdef int j
    cdef double d, d2

    d2 = 0.

    for j in range(n):
        d = row[j] - pp[j]
        d2 += d*d
        if d2>d2max:
            return 0
    return 1

cdef class NNBF:
    cdef readonly object data
    cdef double* raw_data
    cdef readonly int n, numrows, numpoints

    def __init__(self, n):
        create a buffer to hold n dimensional points
        cdef np.ndarray[double, ndim=2] inner_data

        self.n = n
        self.numrows = 10000
        #  XXX how to create empty as contiguous w/o copy?
        data = np.empty((self.numrows, self.n), dtype=np.float)
        self.data = np.ascontiguousarray(data, dtype=np.float)
        inner_data = self.data
        self.raw_data = <double*>inner_data.data
        self.numpoints = 0

    def add(NNBF self, object point):
        add a point to the buffer, grow if necessary
        cdef np.ndarray[double, ndim=2] inner_data
        cdef np.ndarray[double, ndim=1] pp
        pp = np.array(point).astype(np.float)

        self.data[self.numpoints] = pp
        self.numpoints += 1
        if self.numpoints==self.numrows:
            ## XXX do I need to do memory management here, eg free
            ## raw_data if I were using it?
            self.numrows *= 2
            newdata = np.empty((self.numrows, self.n), np.float)
            newdata[:self.numpoints] = self.data
            self.data = np.ascontiguousarray(newdata, dtype=np.float)
            inner_data = self.data
            self.raw_data = <double*>inner_data.data

    def get_data(NNBF self):
        return a copy of data added so far as a numpoints x n array
        return self.data[:self.numpoints]

    def find_neighbors(NNBF self, object point, double radius):
        return a list of indices into data which are within radius
        from point
        cdef int i, neighbor, n
        cdef double d2max
        cdef np.ndarray[double, ndim=1] pp

        # avoid python array indexing in the inner loop
        if len(point)!=self.n:
            raise ValueError('Expected a length %d vector'%self.n)

        pp = np.asarray(point).astype(np.float)

        d2max = radius*radius
        neighbors = []

        # don't do a python lookup inside the loop
        n = self.n

        for i in range(self.numpoints):
            neighbor = is_neighbor(
                self.raw_data + i*n,

            # if the number of points in the cluster is small, the
            # python list performance should not kill us
            if neighbor:

        return neighbors

    def find_neighbors_numpy(self, point, radius):
        do a plain ol numpy lookup to compare performance and output

          *data* is a numpoints x numdims array
          *point* is a numdims length vector
          radius is the max distance distance

        return an array of indices into data which are within radius
        data = self.get_data()
        distance = data - point
        r = np.sqrt((distance*distance).sum(axis=1))
        return np.nonzero(r<=radius)[0]

More information about the Numpy-discussion mailing list