[Numpy-discussion] numpy histogram normed=True (bug / confusing behavior)

Ralf Gommers ralf.gommers@googlemail....
Sun Aug 29 09:26:23 CDT 2010


On Sat, Aug 28, 2010 at 4:32 AM, David Huard <david.huard@gmail.com> wrote:

> Nils and Joseph,
>
> Thanks for the bug report, this is now fixed in SVN (r8672).
>
> Ralph. is this something that you want to see backported in 1.5 ?
>

>From the other replies to your mail I gather your bug fix is still going to
change. If no other issues are reported I'm planning to do the final release
in two days, so it's a bit late for backporting.

Thanks for asking,
Ralf



> Regards,
>
> David
>
>
> On Fri, Aug 6, 2010 at 7:49 PM, <josef.pktd@gmail.com> wrote:
>
>> On Fri, Aug 6, 2010 at 4:53 PM, Nils Becker <n.becker@amolf.nl> wrote:
>> > Hi again,
>> >
>> > first a correction: I posted
>> >
>> >> I believe np.histogram(data, bins, normed=True) effectively does :
>> >>>> np.histogram(data, bins, normed=False) / (bins[-1] - bins[0]).
>> >>>>
>> >>>> However, it _should_ do
>> >>>> np.histogram(data, bins, normed=False) / bins_widths
>> >
>> > but there is a normalization missing; it should read
>> >
>> > I believe np.histogram(data, bins, normed=True) effectively does
>> > np.histogram(data, bins, normed=False) / (bins[-1] - bins[0]) /
>> data.sum()
>> >
>> > However, it _should_ do
>> > np.histogram(data, bins, normed=False) / bins_widths / data.sum()
>> >
>> > Bruce Southey replied:
>> >> As I recall, there as issues with this aspect.
>> >> Please search the discussion regarding histogram especially David
>> >> Huard's reply in this thread:
>> >> http://thread.gmane.org/gmane.comp.python.numeric.general/22445
>> > I think this discussion pertains to a switch in calling conventions
>> > which happened at the time. The last reply of D. Huard (to me) seems to
>> > say that they did not fix anything in the _old_ semantics, but that the
>> > new semantics is expected to work properly.
>> >
>> > I tried with an infinite bin:
>> > counts, dmy = np.histogram([1,2,3,4], [0.5,1.5,np.inf])
>> > counts
>> > array([1,3])
>> > ncounts, dmy = np.histogram([1,2,3,4], [0.5,1.5,np.inf], normed=1)
>> > ncounts
>> > array([0.,0.])
>> >
>> > this also does not make a lot of sense to me. A better result would be
>> > array([0.25, 0.]), since 25% of the points fall in the first bin; 75%
>> > fall in the second but are spread out over an infinite interval, giving
>> > 0. This is what my second proposal would give. I cannot find anything
>> > wrong with it so far...
>>
>> I didn't find any different information about the meaning of
>> normed=True on the mailing list nor in the trac history
>>
>>        169
>>        170         if normed:
>>        171             db = array(np.diff(bins), float)
>>        172             return n/(n*db).sum(), bins
>>
>> this does not look like the correct piecewise density with unequal
>> binsizes.
>>
>> Thanks Nils for pointing this out, I tried only equal binsizes for a
>> histogram distribution.
>>
>> Josef
>>
>>
>>
>>
>>
>> >
>> > Cheers, Nils
>> > _______________________________________________
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion@scipy.org
>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20100829/d1a78dd8/attachment.html 


More information about the NumPy-Discussion mailing list