[Numpy-discussion] Ticket #605 Incorrect behavior of numpy.histogram
Mon Apr 7 15:26:29 CDT 2008
On Apr 7, 2008, at 4:14 PM, LB wrote:
> +1 for axis and +1 for a keyword to define what to do with values
> outside the range.
> For the keyword, ather than 'outliers', I would propose 'discard' or
> 'exclude', because it could be used to describe the four
> possibilities :
> - discard='low' => values lower than the range are discarded,
> values higher are added to the last bin
> - discard='up' => values higher than the range are discarded,
> values lower are added to the first bin
> - discard='out' => values out of the range are discarded
> - discard=None => values outside of this range are allocated to
> the closest bin
> For the default behavior, most of the case, the sum of the bins 's
> population should be equal to the size of the original one for me, so
> I would prefer discard=None. But I'm also okay with discard='low' in
> order not to break older code, if this is clearly stated.
It seems that people in this discussion are forgetting that the bins
are actually defined by the lower boundaries supplied, such that
bins = [1,3,5]
actually currently means
bin1 -> 1 to 2.99999...
bin2 -> 3 to 4.99999...
bin3 -> 5 to inf
(of course in version 1.0.1 the documentation is inconsistent with the
behavior as described by the original poster). This definition of bins
makes it hard to exclude values as it forces the user to give an extra
value in the bin definition, i.e. the bins statement above only give two
bins, while supplying three values. That seems confusing to me.
I am not sure what the right approach is, but currently using range will
clip the values outside the number the user wants.
More information about the Numpy-discussion