[SciPy-User] numpy.histogram is slow

Zachary Pincus zachary.pincus@yale....
Tue Oct 16 15:12:02 CDT 2012


On Oct 16, 2012, at 4:04 PM, Chris Weisiger wrote:

> My use case is displaying camera image data to the user as it is
> streamed to us; this includes a histogram showing the distribution of
> intensities in the image. Thus I have a 512x512 array of pixel data
> (unsigned 16-bit ints) that I need to generate a histogram for.
> Unfortunately, numpy.histogram takes a significant amount of time --
> about 15ms per call. That's over 60% of the cost of showing an image
> to the user, which means that I can't quite display data as quickly as
> it comes in. So I'm looking for some faster option.
> 
> My searches turned up numpy.bincount, which is nice and zippy, but
> unfortunately omits bins where the total count is 0. This makes sense
> considering that otherwise it would always generate a length-N array
> where N is the maximum value in the input, but it doesn't work for my
> purposes. Are there any better options?

Uh, no? Bincount doesn't omit bins below the maximum value in the input, even if the count is zero:
In [205]: numpy.bincount([5,5,10])
Out[205]: array([0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 1])

Perhaps you mean that bincount omits bins above the maximum value in the input, but below the maximum *possible* value of the input? That's what the minlength parameter was added for in numpy 1.6. So if you don't have this version, either upgrade, or manually zero-pad the bincount output:

bins = numpy.bincount([5,5,10])
padded = numpy.zeros(32, dtype=numpy.uint8)
padded[:len(bins)] = bins

That should be pretty quick.

Zach


More information about the SciPy-User mailing list