[Numpy-discussion] numpy histogram normed=True (bug / confusing behavior)
josef.pktd@gmai...
josef.pktd@gmai...
Sun Aug 29 09:54:33 CDT 2010
On Sun, Aug 29, 2010 at 10:26 AM, Ralf Gommers
<ralf.gommers@googlemail.com> wrote:
>
>
> On Sat, Aug 28, 2010 at 4:32 AM, David Huard <david.huard@gmail.com> wrote:
>>
>> Nils and Joseph,
>> Thanks for the bug report, this is now fixed in SVN (r8672).
>> Ralph. is this something that you want to see backported in 1.5 ?
>
> From the other replies to your mail I gather your bug fix is still going to
> change. If no other issues are reported I'm planning to do the final release
> in two days, so it's a bit late for backporting.
There seems to be pretty much agreement on a new keyword like
density=True to have the correct behavior available.
Two days before a release is really not a good time, but adding an
example to the docstring how to calculate the correct density
histogram will be useful.
Josef
>
> Thanks for asking,
> Ralf
>
>
>>
>> Regards,
>> David
>>
>> On Fri, Aug 6, 2010 at 7:49 PM, <josef.pktd@gmail.com> wrote:
>>>
>>> On Fri, Aug 6, 2010 at 4:53 PM, Nils Becker <n.becker@amolf.nl> wrote:
>>> > Hi again,
>>> >
>>> > first a correction: I posted
>>> >
>>> >> I believe np.histogram(data, bins, normed=True) effectively does :
>>> >>>> np.histogram(data, bins, normed=False) / (bins[-1] - bins[0]).
>>> >>>>
>>> >>>> However, it _should_ do
>>> >>>> np.histogram(data, bins, normed=False) / bins_widths
>>> >
>>> > but there is a normalization missing; it should read
>>> >
>>> > I believe np.histogram(data, bins, normed=True) effectively does
>>> > np.histogram(data, bins, normed=False) / (bins[-1] - bins[0]) /
>>> > data.sum()
>>> >
>>> > However, it _should_ do
>>> > np.histogram(data, bins, normed=False) / bins_widths / data.sum()
>>> >
>>> > Bruce Southey replied:
>>> >> As I recall, there as issues with this aspect.
>>> >> Please search the discussion regarding histogram especially David
>>> >> Huard's reply in this thread:
>>> >> http://thread.gmane.org/gmane.comp.python.numeric.general/22445
>>> > I think this discussion pertains to a switch in calling conventions
>>> > which happened at the time. The last reply of D. Huard (to me) seems to
>>> > say that they did not fix anything in the _old_ semantics, but that the
>>> > new semantics is expected to work properly.
>>> >
>>> > I tried with an infinite bin:
>>> > counts, dmy = np.histogram([1,2,3,4], [0.5,1.5,np.inf])
>>> > counts
>>> > array([1,3])
>>> > ncounts, dmy = np.histogram([1,2,3,4], [0.5,1.5,np.inf], normed=1)
>>> > ncounts
>>> > array([0.,0.])
>>> >
>>> > this also does not make a lot of sense to me. A better result would be
>>> > array([0.25, 0.]), since 25% of the points fall in the first bin; 75%
>>> > fall in the second but are spread out over an infinite interval, giving
>>> > 0. This is what my second proposal would give. I cannot find anything
>>> > wrong with it so far...
>>>
>>> I didn't find any different information about the meaning of
>>> normed=True on the mailing list nor in the trac history
>>>
>>> 169
>>> 170 if normed:
>>> 171 db = array(np.diff(bins), float)
>>> 172 return n/(n*db).sum(), bins
>>>
>>> this does not look like the correct piecewise density with unequal
>>> binsizes.
>>>
>>> Thanks Nils for pointing this out, I tried only equal binsizes for a
>>> histogram distribution.
>>>
>>> Josef
>>>
>>>
>>>
>>>
>>>
>>> >
>>> > Cheers, Nils
>>> > _______________________________________________
>>> > NumPy-Discussion mailing list
>>> > NumPy-Discussion@scipy.org
>>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>> >
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
More information about the NumPy-Discussion
mailing list