[Numpy-discussion] my cython is slow

John Hunter jdh2358@gmail....
Wed Jan 7 16:52:50 CST 2009

Partly as an excuse to learn cython, and partly because I need to eke
out some extra performance of a neighborhood search, I tried to code
up a brute force neighborhood search in cython around an N-dimensional
point p.  I need to incrementally add a point, do a search, add
another point, do another search, so some of the algorithms like those
in scipy.stats.spatial which assume a static data structure with lots
of searches over it probably won't help me.

I wrote some cython code to grow a Npoints x Ndimensions numpy array
(doubling in size every time I exceed Npoints) and then doing a brute
force request for all points within a radius r using a euclidean
distance.  The code is working fine, but it is still slower than a
simple numpy implementation (damn you numpy performance!)

Here is the numpy code::

    def find_neighbors_numpy(data, point, radius):
        do a plain ol numpy lookup to compare performance and output

          *data* is a numpoints x numdims array
          *point* is a numdims length vector
          radius is the max distance distance

        return an array of indices into data which are within radius
        numpoints, n = data.shape
        distance = data - point
        r = np.sqrt((distance*distance).sum(axis=1))
        return np.nonzero(r<=radius)[0]

This requires 6 passed through the data, so I should be able to beat
it with a properly crafted cython algorithm.

I have a class NNBF (Nearest Neighbor Brute Force) with an interface like

    NUMDIM = 6
    nn = nnbf.NNBF(NUMDIM)

    print 'loading data... this could take a while'
    # this could take a while
    for i in range(200000):
        x = np.random.rand(NUMDIM)

    x = np.random.rand(NUMDIM)
    radius = 0.2
    ind = nn.find_neighbors(x, radius)

(in my real use case I would be doing a search after every add)

when I run this vs numpy, numpy is a little faster

    testing nnbf...
        10 trials: mean=0.0420, min=0.0400
    testing numpy...
        10 trials: mean=0.0420, min=0.0300

You can grab the code, the python prototype, the test case and setup
file at http://matplotlib.sf.net/nnbf.zip.  You will need a fairly
recent cython to build it:

  # build the extension and run the test code
  wget http://matplotlib.sf.net/nnbf.zip
  unzip nnbf.zip
  cd nnbf
  python setup.py build_ext --inplace
  python test_nnbf.py

I'm pasting the cython code nnbf.pyx below.  If anyone can advise me
as to how to remove the bottlenecks (I've decorated the code with some
XXX questions) that would be great.  I've read the tutorial at
http://wiki.cython.org/tutorials/numpy but I think I am still missing
something important.  I tried to follow Anne's code in ckdtree.pyx
which makes use of a raw_data structure but I am not sure how/if to
incorporate this into my code -- perhaps there is some juice there....

A brute force nearest neighbor routine with incremental add.  The
internal array data structure grows as you add points

import numpy as np
cimport numpy as np

cdef extern from "math.h":
     float sqrt(float)

cdef inline int is_neighbor(int n, double*row, double*pp, double d2max):
    return 1 if the sum-of-squares of n length array row[j]-pp[j] <= d2max
    cdef int j
    cdef double d, d2

    d2 = 0.

    for j in range(n):
        d = row[j] - pp[j]
        d2 += d*d
        if d2>d2max:
            return 0
    return 1

cdef class NNBF:
    cdef readonly object data
    #cdef double* raw_data
    cdef readonly int n, numrows, numpoints

    def __init__(self, n):
        create a buffer to hold n dimensional points
        #cdef np.ndarray[double, ndim=2] inner_data

        self.n = n
        self.numrows = 100
        #  XXX how to create empty as contiguous w/o copy?
        self.data = np.empty((self.numrows, self.n), dtype=np.float)
        #inner_data = self.data
        #self.raw_data = <double*>inner_data.data
        self.numpoints = 0

    def add(NNBF self, object point):
        add a point to the buffer, grow if necessary
        #cdef np.ndarray[double, ndim=2] inner_data
        cdef np.ndarray[double, ndim=1] pp
        pp = np.asarray(point).astype(np.float)

        self.data[self.numpoints] = pp
        self.numpoints += 1
        if self.numpoints==self.numrows:
            ## XXX do I need to do memory management here, eg free
            ## raw_data if I were using it?
            self.numrows *= 2
            newdata = np.empty((self.numrows, self.n), np.float)
            newdata[:self.numpoints] = self.data
            self.data = newdata
            #self.raw_data = <double*>inner_data.data

    def get_data(NNBF self):
        return a copy of data added so far as a numpoints x n array
        return self.data[:self.numpoints]

    def find_neighbors(NNBF self, object point, double radius):
        return a list of indices into data which are within radius
        from point
        cdef int i, neighbor
        cdef double d2max
        cdef np.ndarray[double, ndim=1] pp
        cdef np.ndarray[double, ndim=1] row

        if len(point)!=self.n:
            raise ValueError('Expected a length %d vector'%self.n)

        pp = np.asarray(point).astype(np.float)

        d2max = radius*radius
        neighbors = []

        for i in range(self.numpoints):
            # XXX : is there a more efficient way to access the row
            # data?  Can/should we be using raw_data here?
            row = self.data[i]
            neighbor = is_neighbor(

            # if the number of points in the cluster is small, the
            # python list performance should not kill us
            if neighbor:

        return neighbors

More information about the Numpy-discussion mailing list