Combining Sebastian and Jae-Joon's suggestions, I have something that  
might work:

 >>> timeit numpy.bincount(array.flat)
10 loops, best of 3: 28.2 ms per loop

This is close enough to video-rate... And I can then combine bins as  
needed to get a particular bin count/range after the fact.

Thanks, everyone,


