[Numpy-discussion] numpy histogram normed=True (bug / confusing behavior)

Nils Becker n.becker@amolf...
Sun Aug 29 08:21:55 CDT 2010

> On Sat, Aug 28, 2010 at 04:12, Zbyszek Szmek <zbyszek@in.waw.pl> wrote:
>> Hi,
>> On Fri, Aug 27, 2010 at 06:43:26PM -0600, Charles R Harris wrote:
>>> ? ?On Fri, Aug 27, 2010 at 2:47 PM, Robert Kern <robert.kern@gmail.com>
>>> ? ?wrote:
>>> ? ? ?On Fri, Aug 27, 2010 at 15:32, David Huard <david.huard@gmail.com>
>>> ? ? ?wrote:
>>> ? ? ?> Nils and Joseph,
>>> ? ? ?> Thanks for the bug report, this is now fixed in SVN (r8672).
>>> ? ? ?While we're at it, can we change the name of the argument? "normed"
>>> ? ? ?has caused so much confusion over the years. We could deprecate
>>> ? ? ?normed=True in favor of pdf=True or density=True.
>> I think it might be a good moment to also include a different type of normalization:
>> ? ? ? n = n / n.sum()
>> i.e. the frequency of counts in each bin. This one is of course very simple to calculate
>> by hand, but very common. I think it would be useful to have this normalization
>> available too. [http://www.itl.nist.gov/div898/handbook/eda/section3/histogra.htm]
> My feeling is that this is trivial to do "by hand". I do not see a
> reason to add an option to histogram() to do this.

+1 for not silently changing the behavior of normed=True. (I'm one of
the people who have worked around it).

One argument in favor of putting both normalizing styles 'frequency' and
'density' may be that the documentation will automatically become very
clear. A user sees all options and there is little chance of a
misunderstanding. Of course, a sentence like "If you want frequency
normalization, use histogram(data, normalized=False)/sum(data)" would
also make things clear, without adding the frequency option.


More information about the NumPy-Discussion mailing list