[Numpy-discussion] Ticket #605 Incorrect behavior of numpy.histogram

Hans Meine meine@informatik.uni-hamburg...
Tue Apr 8 03:00:19 CDT 2008


Am Montag, 07. April 2008 14:34:08 schrieb Hans Meine:
> Am Samstag, 05. April 2008 21:54:27 schrieb Anne Archibald:
> > There's also a fourth option - raise an exception if any points are
> > outside the range.
>
> +1
>
> I think this should be the default.  Otherwise, I tend towards "exclude",
> in order to have comparable bin sizes (when plotting, I always find peaks
> at the ends annoying); this could also be called "clip" BTW.
>
> But really, an exception would follow the Zen: "In the face of ambiguity,
> refuse the temptation to guess."  And with a kwarg: "Explicit is better
> than implicit."

When posting this, I did indeed not think this through fully; as David (and 
Tommy) pointed out, this API does not fit well with the existing `bins` 
option, especially when a sequence of bin bounds is given.  (I guess I was 
mostly thinking about the special case of discrete values and 1:1 bins, as 
typical for uint8 data.)

Thus, I would like to withdraw my above opinion from and instead state that I 
find the current API as clear as it gets.  If you want to exclude values, 
simply pass an additional right bound, and for including outliers, 
passing -inf as additional left bound seems to do the trick.  This could be 
possibly added to the documentation though.

The only critical aspect I see is the `normed` arg.  As it is now, the 
rightmost bin has always infinite size, but it is not treated like that:

In [1]: from numpy import *

In [2]: histogram(arange(10), [2,3,4], normed = True)
Out[2]: (array([ 0.1,  0.1,  0.6]), array([2, 3, 4]))

Even worse, if you try to add an infinite bin to the left, this pulls all 
values to zero (technically, I understand that, but it looks really 
undesirable to me):

In [3]: histogram(arange(10), [-inf, 2,3,4], normed = True)
Out[3]: (array([ 0.,  0.,  0.,  0.]), array([-Inf,   2.,   3.,   4.]))

-- 
Ciao, /  /
     /--/
    /  / ANS


More information about the Numpy-discussion mailing list