[SciPy-Dev] `a` and `b` attributes of the stats distributions

josef.pktd@gmai... josef.pktd@gmai...
Sun Jul 7 17:08:15 CDT 2013


On Sun, Jul 7, 2013 at 5:18 PM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
>
>
>
> On Sun, Jul 7, 2013 at 8:36 PM, Warren Weckesser
> <warren.weckesser@gmail.com> wrote:
>>
>> I've been working on a pull request for one of the distributions in the
>> stats module, and as often happens, this has lead to the urge to do some
>> more clean up.
>>
>> The distributions defined in stats/distributions.py have attributes `a`
>> and `b` that give the lower and upper ends of the support of the
>> distribution (i.e. outside of [a, b], the PDF is 0).  A problem with this
>> API is that for many distributions, the support depends on the parameters.
>> Currently, this is handled in many of the distributions by modifying self.a
>> and self.b in the _argcheck() method (I guess under the assumption that any
>> method that needs `a` and `b` will have called `_argcheck()` at some point).

That's the assumption, and for distribution that change a or b, we
cannot call the private ._xxx methods directly, while for t and normal
distribution it's safe and I do it pretty often.

>>
>> This leads to stateful behavior such as:
>>
>>     In [1]: from scipy.stats import genpareto
>>
>>     In [2]: genpareto.b     # Initially b is inf.
>>     Out[2]: inf
>>
>>     In [3]: genpareto.cdf(0, -2)
>>     Out[3]: 0.0
>>
>>     In [4]: genpareto.b     # Now b is 0.5 (as a scalar array)
>>     Out[4]: array(0.5)

ugly but it's correct, the code needs to call _argcheck to set the
correct bounds.

Carrying over state from previous calls is pretty bad for debugging,
but all the public methods do it correctly.

>>
>>     In [5]: genpareto.cdf(0, 1)
>>     Out[5]: 0.0
>>
>>     In [6]: genpareto.b
>>     Out[6]: array(inf)      # b is back to inf, but as a scalar array.
>>
>> This API is ugly, and I'd like to fix it.
>>
>> To start, I think there should be a method `_support`:
>>
>>     def _support(self, *args):
>>         """Returns a tuple (a,b) that gives the support of the
>> distribution."""
>>         ...
>
>
> The private methods shouldn't use ``*args``. Better would be to follow the
> pattern of pdf & co: a public method support() which defaults to -inf/inf,
> and an optional private method _support() which takes the correct fixed
> shape parameters.
>
>
>>
>> All the distributions would be modified to call `self._support(...)` to
>> get `a` and `b` instead of using the attributes `self.a` and `self.b`.  (The
>> method could be public--I just defaulted to private while working out the
>> appropriate API.)
>>
>> It would be nice to get rid of `a` and `b` entirely, but they are
>> currently public attributes, so I don't think we can simply remove them.
>
>
> Backwards compatibility does seem to be a problem here. Not sure what the
> best solution is.
>
> Another issue is that self.a and self.b are used inside private methods,
> which themselves may not have the correct info to call support() again.
> Probably all fixable, but could be some work. The code will also become a
> little more verbose.

My preferred solution to the state problem would be to create new
instances each time, then we don't have to worry about carrying over
and cleaning up state all the time

My first impression is that `_support` would not help much, isn't it
similar to what _argcheck currently does? We still have to store the
information as attributes somewhere, or you have to recalculate the
support each time.

And then every piece of code that needs to know a and b needs to find
it, either you hand it around as additional arguments, or you need
every function or method to call `_support` which is similar to what
the public methods currently do with _argcheck.
(and the private methods do their work after the public methods have
called _argcheck and set a and b if necessary).


What would be useful is a public (top level generic) public function
`support_bounds` (or something like that) that the user can use to get
the support of the distribution. (but it won't change the internal
headaches.

it could just call _argcheck and return a and b, however translated by
loc and scale, since a and b are the bounds of the standardized
distribution (loc=0, scale=1)

Josef


>
> Ralf
>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>


More information about the SciPy-Dev mailing list