[SciPy-Dev] scipy.stats: algorithm to for ticket 1493

josef.pktd@gmai... josef.pktd@gmai...
Mon Apr 23 15:04:57 CDT 2012


On Mon, Apr 23, 2012 at 2:58 PM, nicky van foreest <vanforeest@gmail.com> wrote:
>>> for xa, xb it doesn't matter whether they are larger or smaller than
>>>> zero, so I don't think we need a special check
>
> I think it does, for suppose that in the algo left = xa = 0.5 (because
> the user has been fiddling with xa) and cdf(xa) > q. Then  setting
> left = 2*left will only worsen the problem. Or do I miss something?

True, however I don't think we have any predefined xa and xb that both
are strictly positive or negative values.
pareto is the only distribution bounded away from zero that I know and
it has xa = -10

>
>>>> it looks good in a few more example cases.
>
> I found another small bug, please see the included code.

later today

>
>>>>
>>>> The difficult cases will be where cdf also doesn't exist and we need
>>>> to get it through integrate.quad, but I don't remember which
>>>> distribution is a good case.
>
> This case is harder indeed. (I assume you mean by 'not exist' that
> there is no closed form expression for the cdf, like the normal
> distribution). Computing the ppf would involve calling quad a lot of
> times. This is wasteful especially since the computation of cdf(b)
> includes the computation of cdf(a) for a < b, supposing that quad runs
> from -np.inf to b. We could repair this by computing cdf(b) = cdf(a) +
> quad(f, a, b), assuming that cdf(a) has been computed already.
> (perhaps I am not clear enough here. If so, let me know.)

not exists = not defined as _cdf method  could also be scipy.special
if there are no closed form expressions

quad should run from dist.a to x, I guess, dist.a might be -inf

>
>>> I just think that we are not able to reach the q=0, q=1 boundaries,
>>> since for some distributions we will run into other numerical
>>> problems. And I'm curious how far we can get with this.
>
> I completely missed to include a test on the obvious cases q >= 1. -
> np.finfo(float).eps and q <= np.finfo(float).eps. It is now in the
> attached file.

>>> findppf(stats.expon, 1e-30)
-6.3593574850511882e-13

lower bound q can be small and won't run into problems with being 0,
until 1e-300?

The right answer should be dist.b for q=numerically 1, lower support
point is dist.a but I don't see when we would need it.

>
>>> Similarly, I don't know whether the default xa and xb are good. I
>>> changed them for a few distributions, but only where I saw obvious
>>> improvements.
>
> I also have no clue what would be good values in general. The choices
> seems reasonable from a practical point of view...
>
>>>> Note: I removed the scale in your example, because internal _ppf works
>>>> on the standard distribution, loc=0, scale=1. loc and scale are added
>>>> generically in .ppf
>
> Thanks. I included also **kwds so that I can pass scale = 10 or
> something like this. Once all works as it should, I'll try to convert
> the code such that it fits nicely in distributions.py.

with self instead of dist, it should already have the signature about
right, no **kwds I assume

>
> The simultaneous updating of left and right in the previous algo is
> wrong. Suppose for instance that cdf(left) < cdf(right) < q. Then both
> left and right would `move to the left'. This is clearly wrong. The
> included code should be better.

would move to the *right* ?

I thought the original was a nice trick, we can shift both left and
right since we know it has to be in that direction, the cut of range
cannot contain the answer.

Or do I miss the point?

>
> With regard to the values of xb and xa. Can a `ordinary' user change
> these? If so, then the ppf finder should include some protection in my
> opinion. If not, the user will get an error that brentq has not the
> right limits, but this error might be somewhat unexpected. (What has
> brentq to do with finding the ppf?) Of course, looking at the code
> this is clear, but I expect most users will no do so.

I don't think  `ordinary' users should touch xa, xb, but they could.
Except for getting around the limitation in this ticket there is no
reason to change xa, xb, so we could make them private _xa, _xb
instead.


>
> The code contains two choices about how to handle xa and xb. Do you
> have any preference?

I don't really like choice 1, because it removes the use of the
predefined xa, xb. On the other hand, with this extension, xa and xb
wouldn't be really necessary anymore.

another possibility would be to try except brentq with xa, xb first
and get most cases, and switch to your version if needed. I'm not sure
xa, xb are defined well enough that it's worth to go this route,
though.

Thanks for working on this,

Josef

>
> Thanks for your feedback. Very helpful.
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>


More information about the SciPy-Dev mailing list