[Numpy-discussion] Histogram bin definition

David Huard david.huard@gmail....
Wed Jul 16 13:14:56 CDT 2008


Hi Stefan,

It's designed this way. The main reason is that the default bin edges are
generated using

linspace(a.min(), a.max(), bin)

when bin is an integer.

If we leave the rightmost edge open, then the histogram of a 100 items array
will typically yield an histogram with 99 values because the maximum value
is an outlier. I thought the least surprising behavior was to make sure that
all items are counted.

The other reason has to do with backward compatibility, I tried to avoid
breakage for the simplest use case.

`histogram(r, bins=10)` yields the same thing as `histogram(r, bins=10,
new=True)`

We could avoid the open ended edge by defining the edges by
linspace(a.min(), a.max()+delta, bin), but people will wonder why the right
edge is 3.000001 instead of 3.

Cheers,

David






2008/7/16 Stéfan van der Walt <stefan@sun.ac.za>:

> Hi all,
>
> I am busy documenting `histogram`, and the definition of a "bin"
> eludes me.  Here is the behaviour that troubles me:
>
> >>> np.histogram([1,2,1], bins=[0, 1, 2, 3], new=True)
> (array([0, 2, 1]), array([0, 1, 2, 3]))
>
> >From this result, it seems as if a bin is defined as the half-open
> interval [right_edge, left_edge).
>
> Now, looks what happens in the following case:
>
> >>> np.histogram([1,2,3], bins=[0,1,2,3], new=True)
> (array([0, 1, 2]), array([0, 1, 2, 3]))
>
> Here, the last bin is defined by the closed interval [right_edge,
> left_edge]!
>
> Is this a bug, or a design consideration?
>
> Regards
> Stéfan
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://projects.scipy.org/pipermail/numpy-discussion/attachments/20080716/41aad888/attachment.html 


More information about the Numpy-discussion mailing list