[Numpy-discussion] Ticket #605 Incorrect behavior of numpy.histogram
David Huard
david.huard@gmail....
Tue Apr 8 14:25:00 CDT 2008
2008/4/8, Bruce Southey <bsouthey@gmail.com>:
>
> Hi,
> I agree that the current histogram should be changed. However, I am not
> sure 1.0.5 is the correct release for that.
We both agree.
David, this doesn't work for your code:
> r= np.array([1,2,2,3,3,3,4,4,4,4,5,5,5,5,5])
> dbin=[2,3,4]
> rc, rb=histogram(r, bins=dbin, discard=None)
Returns:
> rc=[3 3] # Really should be [3, 3, 9]
> rb=[-9223372036854775808 3 -9223372036854775808]
I used the convention that bins are the bin edges, including the right most
edge, this is why len(rc) =2 and len(rb)=3.
Now there clearly is a bug, and I traced it to the use of np.r_. Check this
out:
In [26]: dbin = [1,2,3]
In [27]: np.r_[-np.inf, dbin, np.inf]
Out[27]: array([-Inf, 1., 2., 3., Inf])
In [28]: np.r_[-np.inf, asarray(dbin), np.inf]
Out[28]:
array([-9223372036854775808, 1,
2, 3, -9223372036854775808])
In [29]: np.r_[-np.inf, asarray(dbin).astype(float), np.inf]
Out[29]: array([-Inf, 1., 2., 3., Inf])
Is this a misuse of r_ or a bug ?
David
But I have not had time to find the error.
>
> Regards
> Bruce
>
>
>
> David Huard wrote:
> > Hans,
> >
> > Note that the current histogram is buggy, in the sense that it assumes
> > that all bins have the same width and computes db = bins[1]-bin[0].
> > This is why you get zeros everywhere.
> >
> > The current behavior has been heavily criticized and I think we should
> > change it. My proposal is to have for histogram the same behavior as
> > for histogramdd and histogram2d: bins are the bin edges, including the
> > rightmost bin, and values outside of the bins are not tallied. The
> > problem with this is that it breaks code, and I'm not sure it's such a
> > good idea to do this in a point release.
> >
> > My short term proposal would be to fix the normalization bug and
> > document the current behavior of histogram for the 1.0.5 release. Once
> > it's done, we can modify histogram and maybe print a warning the first
> > time it's used to notice users of the change.
> >
> > I'd like to hear the voice of experienced devs on this. This issue has
> > been raised a number of times since I follow this ML. It's not the
> > first time I've proposed patches, and I've already documented the
> > weird behavior only to see the comments disappear after a while. I
> > hope this time some kind of agreement will be reached.
> >
> > Regards,
> >
> > David
> >
> >
> >
> >
> > 2008/4/8, Hans Meine <meine@informatik.uni-hamburg.de
>
> > <mailto:meine@informatik.uni-hamburg.de>>:
>
> >
> > Am Montag, 07. April 2008 14:34:08 schrieb Hans Meine:
> >
> > > Am Samstag, 05. April 2008 21:54:27 schrieb Anne Archibald:
> > > > There's also a fourth option - raise an exception if any
> > points are
> > > > outside the range.
> > >
> > > +1
> > >
> > > I think this should be the default. Otherwise, I tend towards
> > "exclude",
> > > in order to have comparable bin sizes (when plotting, I always
> > find peaks
> > > at the ends annoying); this could also be called "clip" BTW.
> > >
> > > But really, an exception would follow the Zen: "In the face of
> > ambiguity,
> > > refuse the temptation to guess." And with a kwarg: "Explicit is
> > better
> > > than implicit."
> >
> >
> > When posting this, I did indeed not think this through fully; as
> > David (and
> > Tommy) pointed out, this API does not fit well with the existing
> > `bins`
> > option, especially when a sequence of bin bounds is given. (I
> > guess I was
> > mostly thinking about the special case of discrete values and 1:1
> > bins, as
> > typical for uint8 data.)
> >
> > Thus, I would like to withdraw my above opinion from and instead
> > state that I
> > find the current API as clear as it gets. If you want to exclude
> > values,
> > simply pass an additional right bound, and for including outliers,
> > passing -inf as additional left bound seems to do the trick. This
> > could be
> > possibly added to the documentation though.
> >
> > The only critical aspect I see is the `normed` arg. As it is now,
> the
> > rightmost bin has always infinite size, but it is not treated like
> > that:
> >
> > In [1]: from numpy import *
> >
> > In [2]: histogram(arange(10), [2,3,4], normed = True)
> > Out[2]: (array([ 0.1, 0.1, 0.6]), array([2, 3, 4]))
> >
> > Even worse, if you try to add an infinite bin to the left, this
> > pulls all
> > values to zero (technically, I understand that, but it looks really
> > undesirable to me):
> >
> > In [3]: histogram(arange(10), [-inf, 2,3,4], normed = True)
> > Out[3]: (array([ 0., 0., 0., 0.]), array([-Inf, 2., 3.,
> 4.]))
> >
> >
> > --
> > Ciao, / /
> > /--/
> > / / ANS
> > _______________________________________________
> > Numpy-discussion mailing list
>
> > Numpy-discussion@scipy.org <mailto:Numpy-discussion@scipy.org>
>
> > http://projects.scipy.org/mailman/listinfo/numpy-discussion
> >
> >
>
> > ------------------------------------------------------------------------
>
> >
> > _______________________________________________
> > Numpy-discussion mailing list
> > Numpy-discussion@scipy.org
> > http://projects.scipy.org/mailman/listinfo/numpy-discussion
> >
>
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://projects.scipy.org/pipermail/numpy-discussion/attachments/20080408/2039d838/attachment-0001.html
More information about the Numpy-discussion
mailing list