[Numpy-discussion] Fast histogram

Zachary Pincus zachary.pincus@yale....
Thu Apr 17 13:11:21 CDT 2008


Hello,

>> But even if indices = array, one still needs to do something like:
>> for index in indices: histogram[index] += 1
> numpy.bincount?

That is indeed what I was looking for! I knew I'd seen such a function.

However, the speed is a bit disappointing. I guess the sorting isn't  
too much of a penalty:

def histogram(array, bins, range):
   min, max = range
   indices = numpy.clip(((array.astype(float) - min) * bins / (max -  
min)).astype(int), 0, bins-1).flat
   return numpy.bincount(indices)

import numexpr
def histogram_numexpr(array, bins, range):
   min, max = range
   min = float(min)
   max = float(max)
   indices = numexpr.evaluate('(array - min) * bins / (max - min)')
   indices = numpy.clip(indices.astype(int), 0, bins-1).flat
   return numpy.bincount(indices)

 >>> arr.shape
(1300, 1030)

 >>> timeit numpy.histogram(arr, 12, [0, 5000])
10 loops, best of 3: 99.9 ms per loop

 >>>  timeit histogram(arr, 12, [0, 5000])
10 loops, best of 3: 127 ms per loop

 >>> timeit histogram_numexpr(arr, 12, [0, 5000])
10 loops, best of 3: 109 ms per loop

 >>>  timeit numpy.histogram(arr, 5000, [0, 5000])
10 loops, best of 3: 111 ms per loop

 >>>  timeit histogram(arr, 5000, [0, 5000])
10 loops, best of 3: 127 ms per loop

 >>> timeit histogram_numexpr(arr, 5000, [0, 5000])
10 loops, best of 3: 108 ms per loop

So, they're all quite close, and it seems that numpy.histogram is the  
definite winner. Huh. I guess I will have to go to C or maybe weave to  
get up to video-rate, unless folks can suggest some further  
optimizations...

Zach


More information about the Numpy-discussion mailing list