[Numpy-discussion] Ticket #605 Incorrect behavior of numpy.histogram

Bruce Southey bsouthey@gmail....
Tue Apr 8 12:27:08 CDT 2008


Hi,
I agree that the current histogram should be changed. However, I am not 
sure 1.0.5 is the correct release for that.

David, this doesn't work for your code:
r= np.array([1,2,2,3,3,3,4,4,4,4,5,5,5,5,5])
dbin=[2,3,4]
rc, rb=histogram(r, bins=dbin, discard=None)

Returns:
rc=[3 3] # Really should be [3, 3, 9]
rb=[-9223372036854775808                    3 -9223372036854775808]

But I have not had time to find the error.

Regards
Bruce


David Huard wrote:
> Hans,
>
> Note that the current histogram is buggy, in the sense that it assumes 
> that all bins have the same width and computes db = bins[1]-bin[0]. 
> This is why you get zeros everywhere.
>
> The current behavior has been heavily criticized and I think we should 
> change it. My proposal is to have for histogram the same behavior as 
> for histogramdd and histogram2d: bins are the bin edges, including the 
> rightmost bin, and values outside of the bins are not tallied. The 
> problem with this is that it breaks code, and I'm not sure it's such a 
> good idea to do this in a point release.
>
> My short term proposal would be to fix the normalization bug and 
> document the current behavior of histogram for the 1.0.5 release. Once 
> it's done, we can modify histogram and maybe print a warning the first 
> time it's used to notice users of the change.
>
> I'd like to hear the voice of experienced devs on this. This issue has 
> been raised a number of times since I follow this ML. It's not the 
> first time I've proposed patches, and I've already documented the 
> weird behavior only to see the comments disappear after a while. I 
> hope this time some kind of agreement will be reached.
>
> Regards,
>
> David
>
>
>
>
> 2008/4/8, Hans Meine <meine@informatik.uni-hamburg.de 
> <mailto:meine@informatik.uni-hamburg.de>>:
>
>     Am Montag, 07. April 2008 14:34:08 schrieb Hans Meine:
>
>     > Am Samstag, 05. April 2008 21:54:27 schrieb Anne Archibald:
>     > > There's also a fourth option - raise an exception if any
>     points are
>     > > outside the range.
>     >
>     > +1
>     >
>     > I think this should be the default.  Otherwise, I tend towards
>     "exclude",
>     > in order to have comparable bin sizes (when plotting, I always
>     find peaks
>     > at the ends annoying); this could also be called "clip" BTW.
>     >
>     > But really, an exception would follow the Zen: "In the face of
>     ambiguity,
>     > refuse the temptation to guess."  And with a kwarg: "Explicit is
>     better
>     > than implicit."
>
>
>     When posting this, I did indeed not think this through fully; as
>     David (and
>     Tommy) pointed out, this API does not fit well with the existing
>     `bins`
>     option, especially when a sequence of bin bounds is given.  (I
>     guess I was
>     mostly thinking about the special case of discrete values and 1:1
>     bins, as
>     typical for uint8 data.)
>
>     Thus, I would like to withdraw my above opinion from and instead
>     state that I
>     find the current API as clear as it gets.  If you want to exclude
>     values,
>     simply pass an additional right bound, and for including outliers,
>     passing -inf as additional left bound seems to do the trick.  This
>     could be
>     possibly added to the documentation though.
>
>     The only critical aspect I see is the `normed` arg.  As it is now, the
>     rightmost bin has always infinite size, but it is not treated like
>     that:
>
>     In [1]: from numpy import *
>
>     In [2]: histogram(arange(10), [2,3,4], normed = True)
>     Out[2]: (array([ 0.1,  0.1,  0.6]), array([2, 3, 4]))
>
>     Even worse, if you try to add an infinite bin to the left, this
>     pulls all
>     values to zero (technically, I understand that, but it looks really
>     undesirable to me):
>
>     In [3]: histogram(arange(10), [-inf, 2,3,4], normed = True)
>     Out[3]: (array([ 0.,  0.,  0.,  0.]), array([-Inf,   2.,   3.,   4.]))
>
>
>     --
>     Ciao, /  /
>          /--/
>         /  / ANS
>     _______________________________________________
>     Numpy-discussion mailing list
>     Numpy-discussion@scipy.org <mailto:Numpy-discussion@scipy.org>
>     http://projects.scipy.org/mailman/listinfo/numpy-discussion
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>   



More information about the Numpy-discussion mailing list