[Numpy-discussion] ticket #605
Wed Apr 9 12:24:39 CDT 2008
On Wed, Apr 9, 2008 at 7:01 AM, David Huard <email@example.com> wrote:
> Hello Jarrod and co.,
> here is my personal version of the histogram saga.
> The current version of histogram puts in the rightmost bin all values
> larger than range, but does not put in the leftmost bin all values smaller
> than bin, eg.
> In : histogram([1,2,3,4,5,6], bins=3, range=[2,5])
> Out: (array([1, 1, 3]), array([ 2., 3., 4.]))
> It discards 1, but puts 2 in the first bin, 3 in the second bin, and 4,5,6
> in the third bin. Also, the docstring says that outliers are put in the
> closest bin, which is false. Another point to consider is normalization.
> Currently, the normalization factor is db=bin-bin. Of course, if the
> bins are not equally spaced, this will yield a spurious density. Also, I'd
> argue that since the rightmost bin covers the space from bin[-1] to
> infinity, it's density should always be zero.
> Now if someone wants to explain all that in the docstring, that's fine by
> me. I fully understand the need to avoid breaking people's code. I simply
> hope that in the next big release, this behavior can be changed to something
> that is simpler: bins are the bin edges (instead of the left edges), and
> everything outside the edges is ignored. This would be a nice occasion to
> add an axis keyword and possibly weights, and would make histogram
> consistent with histogramdd. I'm willing to implement those changes, but I
> don't know how to do so without breaking histogram's behavior.
Here's one way which is more or less what they tend to do in the core Python
to avoid breaking things.
1. Choose a new name for histogram with the desired behavior.
'histogram1D' for example.
2. Add the function with the new behavior to major release X and
modify the old 'histogram' to produce a PendingDeprecationWarning (which by
default does nothing, you need to change the warning filter to see
3. In major release X+1, change the PendingDeprecationWarning to a
DeprecationWarning. Now people will start to see warnings when they use
4. In major release X+2, rip out histogram.
So, if you got the new version into 1.1, in 1.2 it would start complaining
when you used histogram and in 1.3 histogram would be gone, but the new
version would be in it's place. In this way, there's no point where the
behavior of histogram just changes subtly; since it disappears one is forced
to figure out where it went and implement appropriate changes in ones code.
> I just got Bruce reply, so sorry for the overlap.
> 2008/4/9, Jarrod Millman <firstname.lastname@example.org>:
> > Hello,
> > I just turned this one into a blocker for now. There has been a very
> > long and good discussion about this ticket:
> > http://projects.scipy.org/scipy/numpy/ticket/605
> > Could someone (David?, Bruce?) briefly summarize the problem and the
> > current proposed solution for us again? Let's agree on the problem
> > and the solution. I want to have something similiar to what is
> > written about median for this release:
> > http://projects.scipy.org/scipy/numpy/milestone/1.0.5
> > I agree with David's sentiment: "This issue has been raised a number
> > of times since I follow this ML. It's not the first time I've proposed
> > patches, and I've already documented the weird behavior only to see
> > the comments disappear after a while. I hope this time some kind of
> > agreement will be reached."
> > If you give me the short summary I will make sure Travis or Eric
> > respond (and I will put it in the release notes).
> > Thanks,
> > --
> > Jarrod Millman
> > Computational Infrastructure for Research Labs
> > 10 Giannini Hall, UC Berkeley
> > phone: 510.643.4014
> > http://cirl.berkeley.edu/
> Numpy-discussion mailing list
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Numpy-discussion