[Numpy-discussion] Improving NumPy's indexing / subsetting / fancy indexing implementation
Charles R Harris
charlesr.harris@gmail....
Sat Apr 7 08:58:11 CDT 2012
On Fri, Apr 6, 2012 at 4:00 AM, Nathaniel Smith <njs@pobox.com> wrote:
> Hi Wes,
>
> I believe that Mark rewrote a bunch of the fancy-indexing-related code
> from scratch in the masked-NA branch. I don't know if it affects
> anything you're talking about here, but just as a heads up, you might
> want to benchmark master, since it may have a different performance
> profile.
>
> -- Nathaniel
>
> On Fri, Apr 6, 2012 at 4:04 AM, Wes McKinney <wesmckinn@gmail.com> wrote:
> > dear all,
> >
> > I've routinely found that:
> >
> > 1) ndarray.take is up to 1 order of magnitude faster than fancy indexing
> > 2) Hand-coded Cython boolean indexing is many times faster than ndarray
> indexing
> > 3) putmask is significantly faster than ndarray indexing
> >
> > For example, I stumbled on this tonight:
> >
> > straightforward cython code:
> >
> > def row_bool_subset(ndarray[float64_t, ndim=2] values,
> > ndarray[uint8_t, cast=True] mask):
> > cdef:
> > Py_ssize_t i, j, n, k, pos = 0
> > ndarray[float64_t, ndim=2] out
> >
> > n, k = (<object> values).shape
> > assert(n == len(mask))
> >
> > out = np.empty((mask.sum(), k), dtype=np.float64)
> >
> > for i in range(n):
> > if mask[i]:
> > for j in range(k):
> > out[pos, j] = values[i, j]
> > pos += 1
> >
> > return out
> >
> > In [1]: values = randn(1000000, 4)
> >
> > In [2]: mask = np.ones(1000000, dtype=bool)
> >
> > In [3]: import pandas._sandbox as sbx
> >
> > In [4]: result = sbx.row_bool_subset(values, mask)
> >
> > In [5]: timeit result = sbx.row_bool_subset(values, mask)
> > 100 loops, best of 3: 9.58 ms per loop
> >
> > pure NumPy:
> >
> > In [6]: timeit values[mask]
> > 10 loops, best of 3: 81.6 ms per loop
> >
> > Here's the kind of take performance problems that I routinely experience:
> >
> > In [12]: values = randn(1000000, 4)
> > v
> > In [13]: values.shape
> > Out[13]: (1000000, 4)
> >
> > In [14]: indexer = np.random.permutation(1000000)[:500000]
> >
> > In [15]: timeit values[indexer]
> > 10 loops, best of 3: 70.7 ms per loop
> >
> > In [16]: timeit values.take(indexer, axis=0)
> > 100 loops, best of 3: 13.3 ms per loop
> >
> > When I can spare the time in the future I will personally work on
> > these indexing routines in the C codebase, but I suspect that I'm not
> > the only one adversely affected by these kinds of performance
> > problems, and it might be worth thinking about a community effort to
> > split up the work of retooling these methods to be more performant. We
> > could use a tool like my vbench project (github.com/wesm/vbench) to
> > make a list of the performance benchmarks of interest (like the ones
> > above).
> >
> > Unfortunately I am too time constrained at least for the next 6 months
> > to devote to a complete rewrite of the code in question. Possibly
> > sometime in 2013 if no one has gotten to it yet, but this seems like
> > someplace that we should be concerned about as the language
> > performance wars continue to rage.
> >
>
New workers in the C vineyards are always welcome. I believe Mark is
unhappy with fancy indexing, both in design and implementation.
Unfortunately it is pretty entrenched in existing code, so making
fundamental changes could be difficult. But it looks like a good topic for
discussion in depth.
Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20120407/9b89ff9f/attachment.html
More information about the NumPy-Discussion
mailing list