[Numpy-discussion] Ticket #605 Incorrect behavior of numpy.histogram
Bruce Southey
bsouthey@gmail....
Tue Apr 8 12:27:08 CDT 2008
Hi,
I agree that the current histogram should be changed. However, I am not
sure 1.0.5 is the correct release for that.
David, this doesn't work for your code:
r= np.array([1,2,2,3,3,3,4,4,4,4,5,5,5,5,5])
dbin=[2,3,4]
rc, rb=histogram(r, bins=dbin, discard=None)
Returns:
rc=[3 3] # Really should be [3, 3, 9]
rb=[-9223372036854775808 3 -9223372036854775808]
But I have not had time to find the error.
Regards
Bruce
David Huard wrote:
> Hans,
>
> Note that the current histogram is buggy, in the sense that it assumes
> that all bins have the same width and computes db = bins[1]-bin[0].
> This is why you get zeros everywhere.
>
> The current behavior has been heavily criticized and I think we should
> change it. My proposal is to have for histogram the same behavior as
> for histogramdd and histogram2d: bins are the bin edges, including the
> rightmost bin, and values outside of the bins are not tallied. The
> problem with this is that it breaks code, and I'm not sure it's such a
> good idea to do this in a point release.
>
> My short term proposal would be to fix the normalization bug and
> document the current behavior of histogram for the 1.0.5 release. Once
> it's done, we can modify histogram and maybe print a warning the first
> time it's used to notice users of the change.
>
> I'd like to hear the voice of experienced devs on this. This issue has
> been raised a number of times since I follow this ML. It's not the
> first time I've proposed patches, and I've already documented the
> weird behavior only to see the comments disappear after a while. I
> hope this time some kind of agreement will be reached.
>
> Regards,
>
> David
>
>
>
>
> 2008/4/8, Hans Meine <meine@informatik.uni-hamburg.de
> <mailto:meine@informatik.uni-hamburg.de>>:
>
> Am Montag, 07. April 2008 14:34:08 schrieb Hans Meine:
>
> > Am Samstag, 05. April 2008 21:54:27 schrieb Anne Archibald:
> > > There's also a fourth option - raise an exception if any
> points are
> > > outside the range.
> >
> > +1
> >
> > I think this should be the default. Otherwise, I tend towards
> "exclude",
> > in order to have comparable bin sizes (when plotting, I always
> find peaks
> > at the ends annoying); this could also be called "clip" BTW.
> >
> > But really, an exception would follow the Zen: "In the face of
> ambiguity,
> > refuse the temptation to guess." And with a kwarg: "Explicit is
> better
> > than implicit."
>
>
> When posting this, I did indeed not think this through fully; as
> David (and
> Tommy) pointed out, this API does not fit well with the existing
> `bins`
> option, especially when a sequence of bin bounds is given. (I
> guess I was
> mostly thinking about the special case of discrete values and 1:1
> bins, as
> typical for uint8 data.)
>
> Thus, I would like to withdraw my above opinion from and instead
> state that I
> find the current API as clear as it gets. If you want to exclude
> values,
> simply pass an additional right bound, and for including outliers,
> passing -inf as additional left bound seems to do the trick. This
> could be
> possibly added to the documentation though.
>
> The only critical aspect I see is the `normed` arg. As it is now, the
> rightmost bin has always infinite size, but it is not treated like
> that:
>
> In [1]: from numpy import *
>
> In [2]: histogram(arange(10), [2,3,4], normed = True)
> Out[2]: (array([ 0.1, 0.1, 0.6]), array([2, 3, 4]))
>
> Even worse, if you try to add an infinite bin to the left, this
> pulls all
> values to zero (technically, I understand that, but it looks really
> undesirable to me):
>
> In [3]: histogram(arange(10), [-inf, 2,3,4], normed = True)
> Out[3]: (array([ 0., 0., 0., 0.]), array([-Inf, 2., 3., 4.]))
>
>
> --
> Ciao, / /
> /--/
> / / ANS
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org <mailto:Numpy-discussion@scipy.org>
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>
More information about the Numpy-discussion
mailing list