[Numpy-discussion] numpy release

Tommy Grav tgrav@mac....
Thu Apr 24 09:58:08 CDT 2008


I think a long term strategy needs to be adopted for histogram.
Right now there is a great confusion in what the "bins" keyword
does. Right now it is defined as the lower edge of each bin, meaning
that the last bin is open ended and [inf,bin0> does not exist. While
this may not be the right thing to fix in 1.1.0, I would really like to
see it fixed somewhere down the line.


On Apr 24, 2008, at 10:28 AM, Pauli Virtanen wrote:

> Wed, 23 Apr 2008 16:20:41 -0400, David Huard wrote:
>> I haven't found a way to fix histogram reliably without breaking the
>> current behavior. There is a patch attached to the ticket, if the
>> decision is to break histogram.
>
> Summary of the facts (again...):
>
>  a) histogram's docstring does not match its behavior wrt
>     discarding data

This is an easy fix and should definitively go into 1.1.0 :)

>  b) given variable-width bins, histogram(..., normed=True)
>     the results are wrong

Also a quick fix that should be part of 1.1.0

>  c) it might make more sense to handle discarding data in some
>     other way than what histogram does now

I would like to see this, but it does not have to happen in 1.1.0 :)

> I think there are now a couple of choices what to do with this:
>
>  A) Change the semantics of histogram function. Old code using  
> histogram
> will just simply break, maybe in mysterious ways

Not really a satisfactory approach. I really don't mind, even though  
it would break
some code of mine. I would rather see a better function and have to do  
some
code changes, than the current confusion. Other people will likely  
disagree.

> B) Rename the bins parameter to bin_edges or something else, so that
> any old code using histogram immediately raises an exception that is
> easily understood.

Given this approach bin_edges would contain one more value than bins  
given
that the right edge of the last bin has to be defined.

> C) Create a new parameter with more sensible behavior and a name
> different from "bins", and deprecate (at least giving sequences to)  
> the
> "bins" parameter: put up a DeprecationWarning if the user does this,  
> but
> still produce the same results as the old histogram. This way the user
> can forward-port her code at leisure.

I think this is probably the best approach to accommodate everyone.

> So which one (or something else) do we choose for 1.1.0?
>
> -- 
> Pauli Virtanen

Cheers
    Tommy


More information about the Numpy-discussion mailing list