[SciPy-Dev] scipy.stats: algorithm to for ticket 1493
Wed Apr 25 14:49:12 CDT 2012
On Wed, Apr 25, 2012 at 3:21 PM, nicky van foreest <firstname.lastname@example.org> wrote:
>>>>>> The difficult cases will be where cdf also doesn't exist and we need
>>>>>> to get it through integrate.quad, but I don't remember which
>>>>>> distribution is a good case.
>>> This case is harder indeed. (I assume you mean by 'not exist' that
>>> there is no closed form expression for the cdf, like the normal
>>> distribution). Computing the ppf would involve calling quad a lot of
>>> times. This is wasteful especially since the computation of cdf(b)
>>> includes the computation of cdf(a) for a < b, supposing that quad runs
>>> from -np.inf to b. We could repair this by computing cdf(b) = cdf(a) +
>>> quad(f, a, b), assuming that cdf(a) has been computed already.
>>> (perhaps I am not clear enough here. If so, let me know.)
>> not exists = not defined as _cdf method could also be scipy.special
>> if there are no closed form expressions
> I see, sure.
>>>>> I just think that we are not able to reach the q=0, q=1 boundaries,
>>>>> since for some distributions we will run into other numerical
>>>>> problems. And I'm curious how far we can get with this.
>>> I completely missed to include a test on the obvious cases q >= 1. -
>>> np.finfo(float).eps and q <= np.finfo(float).eps. It is now in the
>>> attached file.
>>>>> findppf(stats.expon, 1e-30)
> This result shows actually that xa and xb are necessary to include in
> the specification of the distribution. The exponential distribution is
> (usually) defined only on [0, \infty) not on the negative numbers. The
> result above is negative though. This is of course a simple
> consequence of calling brentq. From a user's perspective, though, I
> would become very suspicious about this negative result.
good argument to clean up xa, xb
>> The right answer should be dist.b for q=numerically 1, lower support
>> point is dist.a but I don't see when we would need it.
> I agree, provided xa and xb are always properly defined. But then,
> (just to be nitpicking), the definition of expon does not set xa and
> xb explicitly. Hence xa = -10, and this is somewhat undesirable, given
> the negative value above.
>>> The simultaneous updating of left and right in the previous algo is
>>> wrong. Suppose for instance that cdf(left) < cdf(right) < q. Then both
>>> left and right would `move to the left'. This is clearly wrong. The
>>> included code should be better.
>> would move to the *right* ?
>> I thought the original was a nice trick, we can shift both left and
>> right since we know it has to be in that direction, the cut of range
>> cannot contain the answer.
>> Or do I miss the point?
> No, you are right. When I wrote this at first, I also thought about
> the point you bring up here. Then, I was somewhat dissatisfied with
> calling the while loop twice (suppose the left bound requires
> updating, then certainly the second while loop (to update the right
> bound) is unnecessary, and calling cdf(right) is useless). While
> trying to fix this, I forgot about my initial ideas...
>>> With regard to the values of xb and xa. Can a `ordinary' user change
>>> these? If so, then the ppf finder should include some protection in my
>>> opinion. If not, the user will get an error that brentq has not the
>>> right limits, but this error might be somewhat unexpected. (What has
>>> brentq to do with finding the ppf?) Of course, looking at the code
>>> this is clear, but I expect most users will no do so.
>> I don't think `ordinary' users should touch xa, xb, but they could.
>> Except for getting around the limitation in this ticket there is no
>> reason to change xa, xb, so we could make them private _xa, _xb
> I think that would be better. Thus, the developer that subclasses
> rv_continuous should set _xa and _xb properly.
>>> The code contains two choices about how to handle xa and xb. Do you
>>> have any preference?
>> I don't really like choice 1, because it removes the use of the
>> predefined xa, xb. On the other hand, with this extension, xa and xb
>> wouldn't be really necessary anymore.
> In view of your example with findppf(expon(1e-30)) I prefer to use _xa and _xb.
>> another possibility would be to try except brentq with xa, xb first
>> and get most cases, and switch to your version if needed. I'm not sure
>> xa, xb are defined well enough that it's worth to go this route,
> I think that this makes the most sense. The definition of the class
> should include sensible values of xa and xb.
> All in all, I would like to make the following proposal to resolve the
> ticket in a generic way.
> 1) xa and xb should become private class members _xa and _xb
> 2) _xa and _xb should be given proper values in the class definition,
> e.g. expon._xa = 0 and expon._xb = 30., since exp(-30) = 9.35e-14.
> 3) given a quantile q in the ppf function, include a test on _cdf(_xa)
> <= q <= _cdf(_xb). If this fails, return an exception with some text
> that tells that either _cdf(_xa) > q or _cdf(_xb) < q.
> Given your comments I actually favor all this searching for left and
> right not that much anymore. It is generic, but it places the
> responsibility of good code at the wrong place.
3) I prefer your expanding the search to raising an exception to the
user. Note also that your 3) is inconsistent with 1). If a user
visible exception is raised, then the user needs to change xa or xb,
so it shouldn't be private. That's the current situation (except for a
more cryptic message).
2) I'm all in favor, especially for one-side bound distributions,
where it should be easy to go through those. There might be a few
where the bound moves with the shape, but the only one I remember is
genextreme and that has an explicit _ppf
So I would prefer 1), 2) and your new enhanced generic _ppf
> SciPy-Dev mailing list
More information about the SciPy-Dev