[SciPy-user] Pb with numpy.histogram

David Huard david.huard@gmail....
Thu Sep 27 12:28:21 CDT 2007


Hi LB,

I think histogram has had this weird behavior since the numeric era and a
lot of code may break if we fix it. Basically, histogram discards the lower
than range values as outliers but puts the higher than range values into the
last bin.

I'm generally using my own histograming routines, I could send them your way
if you're interested.

David


2007/9/27, LB <berthe.loic@gmail.com>:
>
>    Hi,
>
> I've got strange results with numpy.histogram :
>
> Here is its doc strings :
> """
> Help on function histogram in module numpy.lib.function_base:
>
> histogram(a, bins=10, range=None, normed=False)
>     Compute the histogram from a set of data.
>
>     :Parameters:
>       - `a` : array
>         The data to histogram. n-D arrays will be flattened.
>       - `bins` : int or sequence of floats, optional
>         If an int, then the number of equal-width bins in the given
> range.
>         Otherwise, a sequence of the lower bound of each bin.
>       - `range` : (float, float), optional
>         The lower and upper range of the bins. If not provided, then
> (a.min(),
>         a.max()) is used. Values outside of this range are allocated
> to the
>         closest bin.
>       - `normed` : bool, optional
>         If False, the result array will contain the number of samples
> in each bin.
>         If True, the result array is the value of the probability
> *density*
>         function at the bin normalized such that the *integral* over
> the range
>         is 1. Note that the sum of all of the histogram values will
> not usually
>         be 1; it is not a probability *mass* function.
>
>     :Returns:
>       - `hist` : array (n,)
>         The values of the histogram. See `normed` for a description of
> the
>         possible semantics.
>       - `lower_edges` : float array (n,)
>         The lower edges of each bin.
> """
>
> and here is a snipplet of code :
> >>> r = random.normal(8, 2, 500)
> >>> r.min(), r.max()
> (1.164117097856284, 13.069426390055149)
> >>> ra
> (3, 12)
> >>> pdf, xpdf = histogram(r, nbins, range=ra, normed=False)
> >>> pdf
> array([ 1,  6,  5,  8, 30, 39, 53, 55, 61, 50, 45, 42, 32, 26, 17,
> 27])
> >>> pdf.sum()
> 497
>
> It seems I've lost 3 of my 500 random numbers !
>
> >>> r[ r>= ra[1]]
> array([ 12.00676288,  12.8381615 ,  12.48380931,  12.55392835,
>         12.26153469,  12.92869504,  12.58290343,  12.03782311,
>         13.06942639,  12.06375346,  12.02970414,  12.53556779,
>         12.54203654,  12.02611864,  12.85113934,  12.64692817])
>
> >>> r[ r<= ra[0]]
> array([ 1.1641171 ,  2.85873306,  2.92046745])
>
> So this number match the number of experiments below the range given
> to histogram.
> This smells like a bug to me.
> Is there something I've misunderstood in the utilisation of
> numpy.histogram ?
>
> For information
> >>> numpy.__version__
> '1.0.2'
>
> Regards,
>
> --
> LB
>
> _______________________________________________
> SciPy-user mailing list
> SciPy-user@scipy.org
> http://projects.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://projects.scipy.org/pipermail/scipy-user/attachments/20070927/bcf981c8/attachment.html 


More information about the SciPy-user mailing list