[Numpy-discussion] Improving NumPy's indexing / subsetting / fancy indexing implementation
Wes McKinney
wesmckinn@gmail....
Thu Apr 5 22:04:35 CDT 2012
dear all,
I've routinely found that:
1) ndarray.take is up to 1 order of magnitude faster than fancy indexing
2) Hand-coded Cython boolean indexing is many times faster than ndarray indexing
3) putmask is significantly faster than ndarray indexing
For example, I stumbled on this tonight:
straightforward cython code:
def row_bool_subset(ndarray[float64_t, ndim=2] values,
ndarray[uint8_t, cast=True] mask):
cdef:
Py_ssize_t i, j, n, k, pos = 0
ndarray[float64_t, ndim=2] out
n, k = (<object> values).shape
assert(n == len(mask))
out = np.empty((mask.sum(), k), dtype=np.float64)
for i in range(n):
if mask[i]:
for j in range(k):
out[pos, j] = values[i, j]
pos += 1
return out
In [1]: values = randn(1000000, 4)
In [2]: mask = np.ones(1000000, dtype=bool)
In [3]: import pandas._sandbox as sbx
In [4]: result = sbx.row_bool_subset(values, mask)
In [5]: timeit result = sbx.row_bool_subset(values, mask)
100 loops, best of 3: 9.58 ms per loop
pure NumPy:
In [6]: timeit values[mask]
10 loops, best of 3: 81.6 ms per loop
Here's the kind of take performance problems that I routinely experience:
In [12]: values = randn(1000000, 4)
v
In [13]: values.shape
Out[13]: (1000000, 4)
In [14]: indexer = np.random.permutation(1000000)[:500000]
In [15]: timeit values[indexer]
10 loops, best of 3: 70.7 ms per loop
In [16]: timeit values.take(indexer, axis=0)
100 loops, best of 3: 13.3 ms per loop
When I can spare the time in the future I will personally work on
these indexing routines in the C codebase, but I suspect that I'm not
the only one adversely affected by these kinds of performance
problems, and it might be worth thinking about a community effort to
split up the work of retooling these methods to be more performant. We
could use a tool like my vbench project (github.com/wesm/vbench) to
make a list of the performance benchmarks of interest (like the ones
above).
Unfortunately I am too time constrained at least for the next 6 months
to devote to a complete rewrite of the code in question. Possibly
sometime in 2013 if no one has gotten to it yet, but this seems like
someplace that we should be concerned about as the language
performance wars continue to rage.
- Wes
More information about the NumPy-Discussion
mailing list