[Numpy-discussion] 3-10x speedup in bincount()
Tue Mar 13 23:52:07 CDT 2007
Well, there were no responses to my earlier email proposing changes to
numpy.bincount() to make it faster and more flexible. Does this mean
noone is using bincount? :-)
Anyway I've now made the proposed modifications and got substantial
speedups of 3-10. Details are in this extract from my C code. For the
time being I will test this as a standalone extension module. When
stable and tests are written, I'll submit as a patch to the numpy's
* Faster versions of bincount and casting strings to integers
* Author: Stephen Simmons, email@example.com
* Date: 11 March 2007
* This module contains C code for functions I am using to accelerate
* SQL-like aggregate functions for a column-oriented database based on
* subtotal's bincount is typically 3-10 times faster than numpy's
* and more flexible
* - takes bins as they are, without promoting their type to int32
* - takes weights as they are, without promoting their type to double
* - ignores negative bins, rather than reporting an error
* - allows optional int32/double output array to be passed in
* - specify maximum bin number to use. Larger bins numbers are ignored
* - only scans for max bin number if neither max_bin nor out array
* atoi is 30-60 times faster than casting a string array to an integer
* and may optionally use a translation table. The translation table is
* a convenient way to map sparse integer strings to adjacent bins before
* using bincount.
* Typical usage
# Set up arrays 5,000,000 elts long. s1 has strings filled with '1' and '2'.
# w are weights
>>> s1 = numpy.ndarray(shape=5000000, dtype='S1'); s1[:]='2'; s1[1::2]='1'
>>> w = numpy.arange(5000000,dtype='f4')
# Using standard numpy functions, string conversion is slow
>>> i = s1.astype('i1') -> 5.95 sec (!)
>>> numpy.bincount(i) -> 113 ms
>>> numpy.bincount(i, w) -> 272 ms
>>> numpy.bincount(s1.astype('i1'), w) -> 6.12 sec
# Using the faster functions here:
>>> i = subtotal.atoi(s1) -> 90 ms (60x faster)
>>> subtotal.bincount(i, max_bin=2) -> 31 ms (3.6x faster)
>>> subtotal.bincount(i, w, max_bin=2) -> 51 ms (5.3x faster)
# In both bases, output is
array([2, 1, 2, ..., 1, 2, 1], dtype=int8)
array([ 0, 2500000, 2500000])
array([ 0.00000000e+00, 6.25000000e+12, 6.24999750e+12])
# As an example of using a translate table, run bincount from atoi applying
# a translate table for counting odd vs even strings. This does the
# in 158ms, a speedup of 38x from 6.12 s above.
>>> t = numpy.arange(256, dtype='i1') % 2
>>> subtotal.bincount(subtotal.atoi(s1, t), w, max_bin=t.max()) ->
array([ 6.24999750e+12, 6.25000000e+12])
* These timings are based on numpy-1.0.1 Windows binary distribution
* for Python 2.5, compared with building this code using standard distutils
* without any particular compiler optimisations:
C:\mnt\dev\subtotal> python setup.py build -cmingw32
* Processor is a 1.83GHz Pentium-M
<< rest of C code omitted >>
Stephen Simmons wrote:
> I'd like to propose some minor modifications to the function
> bincount(arr, weights=None), so would like some feedback from other
> uses of bincount() before I write this up as a proper patch, .
> bincount() has two forms:
> - bincount(x) returns an integer array ians of length max(x)+1 where
> ians[n] is the number of times n appears in x.
> - bincount(x, weights) returns a double array dans of length max(x)+1
> where dans[n] is the sum of elements in the weight vector weights[i]
> at the positions where x[i]==n
> In both cases, all elements of x must be non-negative.
> Proposed changes:
> (1) Remove the restriction that elements of x must be non-negative.
> Currently bincount() starts by finding max(x) and min(x). If the min
> value is negative, an exception is raised. This change proposes
> dropping the initial search for min(x), and instead testing for
> non-negativity while summing values in the return arrays ians or dans.
> Any indexes where where x is negative will be silently ignored. This
> will allow selective bincounts where values to ignore are flagged with
> a negative bin number.
> (2) Allow an optional argument for maximum bin number.
> Currently bincount(x) returns an array whose length is dependent on
> max(x). It is sometimes preferable to specify the exact size of the
> returned array, so this change would add a new optional argument,
> max_bin, which is one less than the size of the returned array. Under
> this change, bincount() starts by finding max(x) only if max_bin is
> not specified. Then the returned array ians or dans is created with
> length max_bin+1, and any indexes that would overflow the output array
> are silently ignored.
> (3) Allow an optional output array, y.
> Currently bincount() creates a new output array each time. Sometimes
> it is preferable to add results to an existing output array, for
> example, when the input array is only available in smaller chunks, or
> for a progressive update strategy to avoid fp precision problems when
> adding lots of small weights to larger subtotals. Thus we can add an
> extra optional argument y that bypasses the creation of an output array.
> With these three change, the function signature of bincount() would
> bincount(x, weights=None, y=None, max_bin=None)
> Anyway, that's the general idea. I'd be grateful for any feedback
> before I code this up as a patch to _compiled_base.c.
More information about the Numpy-discussion