[Numpy-discussion] Ticket #605 Incorrect behavior of numpy.histogram
Tue Apr 8 12:27:08 CDT 2008
I agree that the current histogram should be changed. However, I am not
sure 1.0.5 is the correct release for that.
David, this doesn't work for your code:
rc, rb=histogram(r, bins=dbin, discard=None)
rc=[3 3] # Really should be [3, 3, 9]
rb=[-9223372036854775808 3 -9223372036854775808]
But I have not had time to find the error.
David Huard wrote:
> Note that the current histogram is buggy, in the sense that it assumes
> that all bins have the same width and computes db = bins-bin.
> This is why you get zeros everywhere.
> The current behavior has been heavily criticized and I think we should
> change it. My proposal is to have for histogram the same behavior as
> for histogramdd and histogram2d: bins are the bin edges, including the
> rightmost bin, and values outside of the bins are not tallied. The
> problem with this is that it breaks code, and I'm not sure it's such a
> good idea to do this in a point release.
> My short term proposal would be to fix the normalization bug and
> document the current behavior of histogram for the 1.0.5 release. Once
> it's done, we can modify histogram and maybe print a warning the first
> time it's used to notice users of the change.
> I'd like to hear the voice of experienced devs on this. This issue has
> been raised a number of times since I follow this ML. It's not the
> first time I've proposed patches, and I've already documented the
> weird behavior only to see the comments disappear after a while. I
> hope this time some kind of agreement will be reached.
> 2008/4/8, Hans Meine <email@example.com
> Am Montag, 07. April 2008 14:34:08 schrieb Hans Meine:
> > Am Samstag, 05. April 2008 21:54:27 schrieb Anne Archibald:
> > > There's also a fourth option - raise an exception if any
> points are
> > > outside the range.
> > +1
> > I think this should be the default. Otherwise, I tend towards
> > in order to have comparable bin sizes (when plotting, I always
> find peaks
> > at the ends annoying); this could also be called "clip" BTW.
> > But really, an exception would follow the Zen: "In the face of
> > refuse the temptation to guess." And with a kwarg: "Explicit is
> > than implicit."
> When posting this, I did indeed not think this through fully; as
> David (and
> Tommy) pointed out, this API does not fit well with the existing
> option, especially when a sequence of bin bounds is given. (I
> guess I was
> mostly thinking about the special case of discrete values and 1:1
> bins, as
> typical for uint8 data.)
> Thus, I would like to withdraw my above opinion from and instead
> state that I
> find the current API as clear as it gets. If you want to exclude
> simply pass an additional right bound, and for including outliers,
> passing -inf as additional left bound seems to do the trick. This
> could be
> possibly added to the documentation though.
> The only critical aspect I see is the `normed` arg. As it is now, the
> rightmost bin has always infinite size, but it is not treated like
> In : from numpy import *
> In : histogram(arange(10), [2,3,4], normed = True)
> Out: (array([ 0.1, 0.1, 0.6]), array([2, 3, 4]))
> Even worse, if you try to add an infinite bin to the left, this
> pulls all
> values to zero (technically, I understand that, but it looks really
> undesirable to me):
> In : histogram(arange(10), [-inf, 2,3,4], normed = True)
> Out: (array([ 0., 0., 0., 0.]), array([-Inf, 2., 3., 4.]))
> Ciao, / /
> / / ANS
> Numpy-discussion mailing list
> Numpyfirstname.lastname@example.org <mailto:Numpyemail@example.com>
> Numpy-discussion mailing list
More information about the Numpy-discussion