[Numpy-discussion] numpy histogram normed=True (bug / confusing behavior)

josef.pktd@gmai... josef.pktd@gmai...
Fri Aug 6 18:49:45 CDT 2010


On Fri, Aug 6, 2010 at 4:53 PM, Nils Becker <n.becker@amolf.nl> wrote:
> Hi again,
>
> first a correction: I posted
>
>> I believe np.histogram(data, bins, normed=True) effectively does :
>>>> np.histogram(data, bins, normed=False) / (bins[-1] - bins[0]).
>>>>
>>>> However, it _should_ do
>>>> np.histogram(data, bins, normed=False) / bins_widths
>
> but there is a normalization missing; it should read
>
> I believe np.histogram(data, bins, normed=True) effectively does
> np.histogram(data, bins, normed=False) / (bins[-1] - bins[0]) / data.sum()
>
> However, it _should_ do
> np.histogram(data, bins, normed=False) / bins_widths / data.sum()
>
> Bruce Southey replied:
>> As I recall, there as issues with this aspect.
>> Please search the discussion regarding histogram especially David
>> Huard's reply in this thread:
>> http://thread.gmane.org/gmane.comp.python.numeric.general/22445
> I think this discussion pertains to a switch in calling conventions
> which happened at the time. The last reply of D. Huard (to me) seems to
> say that they did not fix anything in the _old_ semantics, but that the
> new semantics is expected to work properly.
>
> I tried with an infinite bin:
> counts, dmy = np.histogram([1,2,3,4], [0.5,1.5,np.inf])
> counts
> array([1,3])
> ncounts, dmy = np.histogram([1,2,3,4], [0.5,1.5,np.inf], normed=1)
> ncounts
> array([0.,0.])
>
> this also does not make a lot of sense to me. A better result would be
> array([0.25, 0.]), since 25% of the points fall in the first bin; 75%
> fall in the second but are spread out over an infinite interval, giving
> 0. This is what my second proposal would give. I cannot find anything
> wrong with it so far...

I didn't find any different information about the meaning of
normed=True on the mailing list nor in the trac history

	169	
 	170	    if normed:
 	171	        db = array(np.diff(bins), float)
 	172	        return n/(n*db).sum(), bins

this does not look like the correct piecewise density with unequal binsizes.

Thanks Nils for pointing this out, I tried only equal binsizes for a
histogram distribution.

Josef





>
> Cheers, Nils
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


More information about the NumPy-Discussion mailing list