[Numpy-discussion] numpy histogram normed=True (bug / confusing behavior)
Bruce Southey
bsouthey@gmail....
Fri Aug 6 13:22:40 CDT 2010
On 08/06/2010 11:37 AM, josef.pktd@gmail.com wrote:
> On Fri, Aug 6, 2010 at 11:46 AM, Nils Becker<n.becker@amolf.nl> wrote:
>> Hi,
>>
>> I found what looks like a bug in histogram, when the option normed=True
>> is used together with non-uniform bins.
>>
>> Consider this example:
>>
>> import numpy as np
>> data = np.array([1, 2, 3, 4])
>> bins = np.array([.5, 1.5, 4.5])
>> bin_widths = np.diff(bins)
>> (counts, dummy) = np.histogram(data, bins)
>> (densities, dummy) = np.histogram(data, bins, normed=True)
>>
>> What this gives is:
>>
>> bin_widths
>> array([ 1., 3.])
>>
>> counts
>> array([1, 3])
>>
>> densities
>> array([ 0.1, 0.3])
>>
>> The documentation claims that histogram with normed=True gives a
>> density, which integrates to 1. In this example, it is true that
>> (densities * bin_widths).sum() is 1. However, clearly the data are
>> equally spaced, so their density should be uniform and equal to 0.25.
>> Note that (0.25 * bin_widths).sum() is also 1.
>>
>> I believe np.histogram(data, bins, normed=True) effectively does :
>> np.histogram(data, bins, normed=False) / (bins[-1] - bins[0]).
>>
>> However, it _should_ do
>> np.histogram(data, bins, normed=False) / bins_widths
>>
>> to get a true density over the data coordinate as a result. It's easy to
>> fix by hand, but I think the documentation is at least misleading?!
>>
>> sorry if this has been discussed before; I did not find it anyway (numpy
>> 1.3)
> Either I also don't understand histogram or this is a bug.
>
>>>> data = np.arange(1,10)
>>>> bins = np.array([.5, 1.5, 4.5, 7.5, 8.5, 9.5])
>>>> np.histogram(data, bins, normed=True)
> (array([ 0.04761905, 0.14285714, 0.14285714, 0.04761905,
> 0.04761905]), array([ 0.5, 1.5, 4.5, 7.5, 8.5, 9.5]))
>>>> np.histogram(data, bins)
> (array([1, 3, 3, 1, 1]), array([ 0.5, 1.5, 4.5, 7.5, 8.5, 9.5]))
>>>> np.diff(bins)
> array([ 1., 3., 3., 1., 1.])
>
> I don't see what the normed=True numbers are in this case.
>
>>>> np.array([ 1., 3., 3., 1., 1.])/7
> array([ 0.14285714, 0.42857143, 0.42857143, 0.14285714, 0.14285714])
>
> Josef
>
>
As I recall, there as issues with this aspect.
Please search the discussion regarding histogram especially David
Huard's reply in this thread:
http://thread.gmane.org/gmane.comp.python.numeric.general/22445
Bruce
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20100806/26b68d0c/attachment.html
More information about the NumPy-Discussion
mailing list