[Numpy-discussion] Improving NumPy's indexing / subsetting / fancy indexing implementation

Wes McKinney wesmckinn@gmail....
Sat Apr 7 14:16:45 CDT 2012


On Sat, Apr 7, 2012 at 9:58 AM, Charles R Harris
<charlesr.harris@gmail.com> wrote:
>
>
> On Fri, Apr 6, 2012 at 4:00 AM, Nathaniel Smith <njs@pobox.com> wrote:
>>
>> Hi Wes,
>>
>> I believe that Mark rewrote a bunch of the fancy-indexing-related code
>> from scratch in the masked-NA branch. I don't know if it affects
>> anything you're talking about here, but just as a heads up, you might
>> want to benchmark master, since it may have a different performance
>> profile.
>>
>> -- Nathaniel
>>
>> On Fri, Apr 6, 2012 at 4:04 AM, Wes McKinney <wesmckinn@gmail.com> wrote:
>> > dear all,
>> >
>> > I've routinely found that:
>> >
>> > 1) ndarray.take is up to 1 order of magnitude faster than fancy indexing
>> > 2) Hand-coded Cython boolean indexing is many times faster than ndarray
>> > indexing
>> > 3) putmask is significantly faster than ndarray indexing
>> >
>> > For example, I  stumbled on this tonight:
>> >
>> > straightforward cython code:
>> >
>> > def row_bool_subset(ndarray[float64_t, ndim=2] values,
>> >                   ndarray[uint8_t, cast=True] mask):
>> >   cdef:
>> >       Py_ssize_t i, j, n, k, pos = 0
>> >       ndarray[float64_t, ndim=2] out
>> >
>> >   n, k = (<object> values).shape
>> >   assert(n == len(mask))
>> >
>> >   out = np.empty((mask.sum(), k), dtype=np.float64)
>> >
>> >   for i in range(n):
>> >       if mask[i]:
>> >           for j in range(k):
>> >               out[pos, j] = values[i, j]
>> >           pos += 1
>> >
>> >   return out
>> >
>> > In [1]: values = randn(1000000, 4)
>> >
>> > In [2]: mask = np.ones(1000000, dtype=bool)
>> >
>> > In [3]: import pandas._sandbox as sbx
>> >
>> > In [4]: result = sbx.row_bool_subset(values, mask)
>> >
>> > In [5]: timeit result = sbx.row_bool_subset(values, mask)
>> > 100 loops, best of 3: 9.58 ms per loop
>> >
>> > pure NumPy:
>> >
>> > In [6]: timeit values[mask]
>> > 10 loops, best of 3: 81.6 ms per loop
>> >
>> > Here's the kind of take performance problems that I routinely
>> > experience:
>> >
>> > In [12]: values = randn(1000000, 4)
>> > v
>> > In [13]: values.shape
>> > Out[13]: (1000000, 4)
>> >
>> > In [14]: indexer = np.random.permutation(1000000)[:500000]
>> >
>> > In [15]: timeit values[indexer]
>> > 10 loops, best of 3: 70.7 ms per loop
>> >
>> > In [16]: timeit values.take(indexer, axis=0)
>> > 100 loops, best of 3: 13.3 ms per loop
>> >
>> > When I can spare the time in the future I will personally work on
>> > these indexing routines in the C codebase, but I suspect that I'm not
>> > the only one adversely affected by these kinds of performance
>> > problems, and it might be worth thinking about a community effort to
>> > split up the work of retooling these methods to be more performant. We
>> > could use a tool like my vbench project (github.com/wesm/vbench) to
>> > make a list of the performance benchmarks of interest (like the ones
>> > above).
>> >
>> > Unfortunately I am too time constrained at least for the next 6 months
>> > to devote to a complete rewrite of the code in question. Possibly
>> > sometime in 2013 if no one has gotten to it yet, but this seems like
>> > someplace that we should be concerned about as the language
>> > performance wars continue to rage.
>> >
>
>
> New workers in the C vineyards are always welcome. I believe Mark is unhappy
> with fancy indexing, both in design and implementation. Unfortunately it is
> pretty entrenched in existing code, so making fundamental changes could be
> difficult. But it looks like a good topic for discussion in depth.
>
> Chuck
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>

I plan to get involved in the C codebase as soon as I can spare the
time from pandas development. Perhaps it would be worth setting up
some vbenchmarks like pandas has to give some transparency ([1], [2])
into the performance and how it's been improved over time. I may be
able to set up my build machine to publish benchmarks like these for
NumPy sometime in the near future.

- Wes

[1] http://wesmckinney.com/blog/?p=373
[2] http://pandas.pydata.org/pandas-docs/vbench/vb_indexing.html


More information about the NumPy-Discussion mailing list