[Numpy-discussion] numpy histogram normed=True (bug / confusing behavior)

David Huard david.huard@gmail....
Fri Aug 27 15:32:48 CDT 2010


Nils and Joseph,

Thanks for the bug report, this is now fixed in SVN (r8672).

Ralph. is this something that you want to see backported in 1.5 ?

Regards,

David


On Fri, Aug 6, 2010 at 7:49 PM, <josef.pktd@gmail.com> wrote:

> On Fri, Aug 6, 2010 at 4:53 PM, Nils Becker <n.becker@amolf.nl> wrote:
> > Hi again,
> >
> > first a correction: I posted
> >
> >> I believe np.histogram(data, bins, normed=True) effectively does :
> >>>> np.histogram(data, bins, normed=False) / (bins[-1] - bins[0]).
> >>>>
> >>>> However, it _should_ do
> >>>> np.histogram(data, bins, normed=False) / bins_widths
> >
> > but there is a normalization missing; it should read
> >
> > I believe np.histogram(data, bins, normed=True) effectively does
> > np.histogram(data, bins, normed=False) / (bins[-1] - bins[0]) /
> data.sum()
> >
> > However, it _should_ do
> > np.histogram(data, bins, normed=False) / bins_widths / data.sum()
> >
> > Bruce Southey replied:
> >> As I recall, there as issues with this aspect.
> >> Please search the discussion regarding histogram especially David
> >> Huard's reply in this thread:
> >> http://thread.gmane.org/gmane.comp.python.numeric.general/22445
> > I think this discussion pertains to a switch in calling conventions
> > which happened at the time. The last reply of D. Huard (to me) seems to
> > say that they did not fix anything in the _old_ semantics, but that the
> > new semantics is expected to work properly.
> >
> > I tried with an infinite bin:
> > counts, dmy = np.histogram([1,2,3,4], [0.5,1.5,np.inf])
> > counts
> > array([1,3])
> > ncounts, dmy = np.histogram([1,2,3,4], [0.5,1.5,np.inf], normed=1)
> > ncounts
> > array([0.,0.])
> >
> > this also does not make a lot of sense to me. A better result would be
> > array([0.25, 0.]), since 25% of the points fall in the first bin; 75%
> > fall in the second but are spread out over an infinite interval, giving
> > 0. This is what my second proposal would give. I cannot find anything
> > wrong with it so far...
>
> I didn't find any different information about the meaning of
> normed=True on the mailing list nor in the trac history
>
>        169
>        170         if normed:
>        171             db = array(np.diff(bins), float)
>        172             return n/(n*db).sum(), bins
>
> this does not look like the correct piecewise density with unequal
> binsizes.
>
> Thanks Nils for pointing this out, I tried only equal binsizes for a
> histogram distribution.
>
> Josef
>
>
>
>
>
> >
> > Cheers, Nils
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20100827/64276d11/attachment-0001.html 


More information about the NumPy-Discussion mailing list