[Numpy-discussion] Histograms via indirect index arrays

Travis Oliphant oliphant.travis at ieee.org
Fri Mar 17 01:16:03 CST 2006


Norbert Nemec wrote:
> I have a very much related problem: Not only that the idea described by
> Mads Ipsen does not work, but I could generally find no efficient way to
> do a "counting" of elements in an array, as it is needed for a histogram.
>   
This may be something we are lacking.  


It depends on what you mean by efficient, I suppose.  The sorting 
algorithms are very fast, and the histogram routines that are scattered 
all over the place (including the histogram function in numpy) has been 
in use for a long time, and you are the first person to complain of its 
efficiency.  That isn't to say your complaint may not be valid, it's 
just that for most people the speed has been sufficient.
> What would instead be needed is a function that simply gives the count
> of occurances of given values in a given array:
>   
I presume you are talking of "integer" arrays,  since 
equality-comparison of floating-point values is usually not very helpful 
so most histograms on floating-point values are given in terms of bins.  
Thus, the searchsorted function uses bins for it's "counting" operation.
>   
>>>> [4,5,2,3,2,1,4].count([0,1,2,3,4,5])
>>>>         
> [0,1,2,1,1,2]
>
>   

A count function for integer arrays could certainly be written using a 
C-loop.  But, I would first just use histogram([4,5,2,3,2,1,4], 
[0,1,2,3,4,5]) and make sure that's really too slow, before worrying 
about it too much.

Also, I according to the above function, the right answer is:

[0, 1, 2, 1, 2, 1]


Best,


-Travis






More information about the Numpy-discussion mailing list