[Numpy-discussion] Ticket #605 Incorrect behavior of numpy.histogram
Hans Meine
meine@informatik.uni-hamburg...
Tue Apr 8 03:00:19 CDT 2008
Am Montag, 07. April 2008 14:34:08 schrieb Hans Meine:
> Am Samstag, 05. April 2008 21:54:27 schrieb Anne Archibald:
> > There's also a fourth option - raise an exception if any points are
> > outside the range.
>
> +1
>
> I think this should be the default. Otherwise, I tend towards "exclude",
> in order to have comparable bin sizes (when plotting, I always find peaks
> at the ends annoying); this could also be called "clip" BTW.
>
> But really, an exception would follow the Zen: "In the face of ambiguity,
> refuse the temptation to guess." And with a kwarg: "Explicit is better
> than implicit."
When posting this, I did indeed not think this through fully; as David (and
Tommy) pointed out, this API does not fit well with the existing `bins`
option, especially when a sequence of bin bounds is given. (I guess I was
mostly thinking about the special case of discrete values and 1:1 bins, as
typical for uint8 data.)
Thus, I would like to withdraw my above opinion from and instead state that I
find the current API as clear as it gets. If you want to exclude values,
simply pass an additional right bound, and for including outliers,
passing -inf as additional left bound seems to do the trick. This could be
possibly added to the documentation though.
The only critical aspect I see is the `normed` arg. As it is now, the
rightmost bin has always infinite size, but it is not treated like that:
In [1]: from numpy import *
In [2]: histogram(arange(10), [2,3,4], normed = True)
Out[2]: (array([ 0.1, 0.1, 0.6]), array([2, 3, 4]))
Even worse, if you try to add an infinite bin to the left, this pulls all
values to zero (technically, I understand that, but it looks really
undesirable to me):
In [3]: histogram(arange(10), [-inf, 2,3,4], normed = True)
Out[3]: (array([ 0., 0., 0., 0.]), array([-Inf, 2., 3., 4.]))
--
Ciao, / /
/--/
/ / ANS
More information about the Numpy-discussion
mailing list