[SciPy-Dev] `a` and `b` attributes of the stats distributions

josef.pktd@gmai... josef.pktd@gmai...
Tue Jul 9 09:10:54 CDT 2013


On Mon, Jul 8, 2013 at 4:05 PM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
>
>
>
> On Mon, Jul 8, 2013 at 12:08 AM, <josef.pktd@gmail.com> wrote:
>>
>> On Sun, Jul 7, 2013 at 5:18 PM, Ralf Gommers <ralf.gommers@gmail.com>
>> wrote:
>> >
>> > On Sun, Jul 7, 2013 at 8:36 PM, Warren Weckesser
>> > <warren.weckesser@gmail.com> wrote:
>> >>
>> >> All the distributions would be modified to call `self._support(...)` to
>> >> get `a` and `b` instead of using the attributes `self.a` and `self.b`.
>> >> (The
>> >> method could be public--I just defaulted to private while working out
>> >> the
>> >> appropriate API.)
>> >>
>> >> It would be nice to get rid of `a` and `b` entirely, but they are
>> >> currently public attributes, so I don't think we can simply remove
>> >> them.
>> >
>> >
>> > Backwards compatibility does seem to be a problem here. Not sure what
>> > the
>> > best solution is.
>> >
>> > Another issue is that self.a and self.b are used inside private methods,
>> > which themselves may not have the correct info to call support() again.
>> > Probably all fixable, but could be some work. The code will also become
>> > a
>> > little more verbose.
>>
>> My preferred solution to the state problem would be to create new
>> instances each time, then we don't have to worry about carrying over
>> and cleaning up state all the time
>
>
> If we could start from scratch that would indeed be the way to go. I don't
> think it's that much of a problem that we have to go and break all code
> using the instances now though. So that leaves us with incremental
> improvements.
>
>>
>>
>> My first impression is that `_support` would not help much, isn't it
>> similar to what _argcheck currently does? We still have to store the
>> information as attributes somewhere, or you have to recalculate the
>> support each time.
>
>
> I think the latter was the idea - recalculating is cheap.

Nothing is cheap if you need to do it a few million times
http://mail.scipy.org/pipermail/scipy-dev/2008-November/010331.html

Currently, we check and set a, b once in the main methods
.a and .b are used in the individual calculations

for example .a and .b are used inside a brentq loop
https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L1079

But doing a search, I didn't find many cases where .a .b are used in
second layer generic functions, I had thought there are more.
(If only pdf is given, then cdf uses integrate.quad with .a and .b,
ppf uses brentq which need to repeatedly call cdf - double loop with
.a and .b in both loops, IIRC
I assume a call to numpy.vectorize will also evaluate it again and
again for each element.)

possibly a bug https://github.com/scipy/scipy/issues/2622

Josef

>
>>
>> And then every piece of code that needs to know a and b needs to find
>> it, either you hand it around as additional arguments, or you need
>> every function or method to call `_support` which is similar to what
>> the public methods currently do with _argcheck.
>> (and the private methods do their work after the public methods have
>> called _argcheck and set a and b if necessary).
>>
>>
>> What would be useful is a public (top level generic) public function
>> `support_bounds` (or something like that) that the user can use to get
>> the support of the distribution. (but it won't change the internal
>> headaches.
>>
>> it could just call _argcheck and return a and b, however translated by
>> loc and scale, since a and b are the bounds of the standardized
>> distribution (loc=0, scale=1)
>
>
> That could indeed be useful, it's simple to add.
>
> Ralf
>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>


More information about the SciPy-Dev mailing list