[Numpy-discussion] Fast histogram
Zachary Pincus
zachary.pincus@yale....
Thu Apr 17 13:11:21 CDT 2008
Hello,
>> But even if indices = array, one still needs to do something like:
>> for index in indices: histogram[index] += 1
> numpy.bincount?
That is indeed what I was looking for! I knew I'd seen such a function.
However, the speed is a bit disappointing. I guess the sorting isn't
too much of a penalty:
def histogram(array, bins, range):
min, max = range
indices = numpy.clip(((array.astype(float) - min) * bins / (max -
min)).astype(int), 0, bins-1).flat
return numpy.bincount(indices)
import numexpr
def histogram_numexpr(array, bins, range):
min, max = range
min = float(min)
max = float(max)
indices = numexpr.evaluate('(array - min) * bins / (max - min)')
indices = numpy.clip(indices.astype(int), 0, bins-1).flat
return numpy.bincount(indices)
>>> arr.shape
(1300, 1030)
>>> timeit numpy.histogram(arr, 12, [0, 5000])
10 loops, best of 3: 99.9 ms per loop
>>> timeit histogram(arr, 12, [0, 5000])
10 loops, best of 3: 127 ms per loop
>>> timeit histogram_numexpr(arr, 12, [0, 5000])
10 loops, best of 3: 109 ms per loop
>>> timeit numpy.histogram(arr, 5000, [0, 5000])
10 loops, best of 3: 111 ms per loop
>>> timeit histogram(arr, 5000, [0, 5000])
10 loops, best of 3: 127 ms per loop
>>> timeit histogram_numexpr(arr, 5000, [0, 5000])
10 loops, best of 3: 108 ms per loop
So, they're all quite close, and it seems that numpy.histogram is the
definite winner. Huh. I guess I will have to go to C or maybe weave to
get up to video-rate, unless folks can suggest some further
optimizations...
Zach
More information about the Numpy-discussion
mailing list