[Numpy-discussion] Improving NumPy's indexing / subsetting / fancy indexing implementation

Wes McKinney wesmckinn@gmail....
Thu Apr 5 22:04:35 CDT 2012

dear all,

I've routinely found that:

1) ndarray.take is up to 1 order of magnitude faster than fancy indexing
2) Hand-coded Cython boolean indexing is many times faster than ndarray indexing
3) putmask is significantly faster than ndarray indexing

For example, I  stumbled on this tonight:

straightforward cython code:

def row_bool_subset(ndarray[float64_t, ndim=2] values,
                   ndarray[uint8_t, cast=True] mask):
       Py_ssize_t i, j, n, k, pos = 0
       ndarray[float64_t, ndim=2] out

   n, k = (<object> values).shape
   assert(n == len(mask))

   out = np.empty((mask.sum(), k), dtype=np.float64)

   for i in range(n):
       if mask[i]:
           for j in range(k):
               out[pos, j] = values[i, j]
           pos += 1

   return out

In [1]: values = randn(1000000, 4)

In [2]: mask = np.ones(1000000, dtype=bool)

In [3]: import pandas._sandbox as sbx

In [4]: result = sbx.row_bool_subset(values, mask)

In [5]: timeit result = sbx.row_bool_subset(values, mask)
100 loops, best of 3: 9.58 ms per loop

pure NumPy:

In [6]: timeit values[mask]
10 loops, best of 3: 81.6 ms per loop

Here's the kind of take performance problems that I routinely experience:

In [12]: values = randn(1000000, 4)
In [13]: values.shape
Out[13]: (1000000, 4)

In [14]: indexer = np.random.permutation(1000000)[:500000]

In [15]: timeit values[indexer]
10 loops, best of 3: 70.7 ms per loop

In [16]: timeit values.take(indexer, axis=0)
100 loops, best of 3: 13.3 ms per loop

When I can spare the time in the future I will personally work on
these indexing routines in the C codebase, but I suspect that I'm not
the only one adversely affected by these kinds of performance
problems, and it might be worth thinking about a community effort to
split up the work of retooling these methods to be more performant. We
could use a tool like my vbench project (github.com/wesm/vbench) to
make a list of the performance benchmarks of interest (like the ones

Unfortunately I am too time constrained at least for the next 6 months
to devote to a complete rewrite of the code in question. Possibly
sometime in 2013 if no one has gotten to it yet, but this seems like
someplace that we should be concerned about as the language
performance wars continue to rage.

- Wes

More information about the NumPy-Discussion mailing list