[SciPy-User] scipy.stats.beta does not handle small parameters

josef.pktd@gmai... josef.pktd@gmai...
Sat Apr 24 03:07:09 CDT 2010


On Tue, Apr 20, 2010 at 9:52 AM,  <josef.pktd@gmail.com> wrote:
> On Mon, Apr 19, 2010 at 9:49 AM, John Reid <j.reid@mail.cryst.bbk.ac.uk> wrote:
>> I've been using scipy.stats.beta and I've noticed it returns NaN when
>> the parameters are small. I've tried using the same parameters in R code
>> and it handles them just fine. R also handles parameters up to 17 orders
>> of magnitude smaller. Is there any documentation on which parameter
>> ranges are acceptable? Can I expect similar results with other
>> distributions? Should I file a bug report?
>
> General answer:
> Many distributions have numerical problems in the boundary range of
> the valid distribution parameters, and there are no systematic tests
> for these. In the last two years, we improved the numerical precision
> in many of these cases. However, in other cases, working around
> numerical precision problems is too much effort, too difficult or
> would require too many special cases, so it's not really worth it for
> the "common" usage.
> Filing a bug report is always good, (and even better with a patch), in
> the worst case it adds to documenting the limitation of the current
> implementation.
>
> In the beta.rvs case the relevant code is in numpy.random
>    def _rvs(self, a, b):
>        return mtrand.beta(a,b,self._size)
>
> which means digging through the pyrex or c code to find the source of
> the nans. Most of the other code in distributions.beta is from
> scipy.special (either c or fortran code).
>
> I just tried your first example (the following is also intended as
> documentation for myself)
>
>>>> alpha, beta
> (7.1e-008, 4.2199999999999999e-007)
>
> This is very close to a two point distribution, cdf in scipy and R aggree
>>>> stats.beta.cdf(1e-8, alpha, beta)
> 0.85598265330617773
>>>> r.pbeta(1e-8, alpha, beta)[0]
> 0.85598265330617762
>>>> stats.beta.sf(1-1e-8, alpha, beta)
> 0.14401510767081682
>>>> 1-r.pbeta(1-1e-8, alpha, beta)[0]
> 0.14401510767081682
>>>> stats.beta.cdf(1e-8, alpha, beta) + stats.beta.sf(1-1e-8, alpha, beta)
> 0.99999776097699455
>
> quantile/ppf: R and scipy disagree on what almost zero
>
>>>> r.qbeta(0.5, alpha, beta)[0]
> 1.0547236079445126e-230
>>>> stats.beta.ppf(0.5, alpha, beta)
> 4.9406564584124654e-324
>>>> r.qbeta(0.85, alpha, beta)[0]
> 1.7763568394002505e-015
>>>> stats.beta.ppf(0.85, alpha, beta)
> 0.0
>
> pdf looks ok
>>>> r.dbeta(1e-8, alpha, beta)[0]
> 6.0774768992486097
>>>> stats.beta.pdf(1e-8, alpha, beta)
> 6.0774768992486035
>>>> r.dbeta(1-1e-12, alpha, beta)[0]
> 60775.483679644421
>>>> stats.beta.pdf(1-1e-12, alpha, beta)
> 60775.483679644392
>
> ppf cdf roundtripping looks ok in one direction
>
>>>> stats.beta.ppf(stats.beta.cdf(0.5, alpha, beta), alpha, beta)
> 0.49999999974425491
>>>> r.qbeta(r.pbeta(0.5, alpha, beta)[0], alpha, beta)[0]
> 0.50000000006892631
>
> roundtripping doesn't work in R nor in scipy in the other direction
>>>> r.qbeta(r.qbeta(0.5, alpha, beta)[0], alpha, beta)[0]
> 4.0301331108853522e-308
>>>> stats.beta.cdf(stats.beta.ppf(0.5, alpha, beta), alpha, beta)
> 0.85593853078304538
>
> #check roundtripping formula is correct
>>>> stats.norm.cdf(stats.norm.ppf(0.5, alpha, beta), alpha, beta)
> 0.5
>>>> stats.norm.cdf(stats.norm.ppf(0.25, alpha, beta), alpha, beta)
> 0.25
>
> rvs in R looks ok
>>>> rrvs = r.rbeta(1000, alpha, beta)
>>>> rrvsp = [rrvs[i] for i in range(1000)]
>>>> (np.array(rrvsp)<0.5).mean()
> 0.85499999999999998
>>>> (np.array(rrvsp)>0.5).mean()
> 0.14499999999999999
>
> but not in numpy
>>>> np.random.beta(alpha, beta, size=4)
> array([ NaN,  NaN,  NaN,  NaN])
>
> So it might be worth a look into numpy.random.beta to see what causes the nan.
> beta.ppf and qbeta both have problems in the neighborhood of zero (and one ?)
>
> Josef

dirichlet has the same problem

>>> drvs = np.random.dirichlet([7.10e-05, 4.22e-04],size=100)
>>> np.isnan(drvs[:,0]).sum()
68
>>> (drvs[:,0]<1e-4).sum()
27
>>> (drvs[:,0]>1-1e-4).sum()
5
>>> 27/(27.+5)
0.84375
>>> drvs = np.random.dirichlet([7.10e-14, 4.22e-13],size=100)
>>> np.isnan(drvs[:,0]).sum()
100

Josef

>
>
>> Here's the code I used:
>> import scipy.stats as S, numpy as N
>> from rpy2.robjects import r
>>
>> alpha, beta = 0.0710, 0.4222
>> for i in xrange(20):
>>     x_from_scipy = S.beta.rvs(alpha, beta)
>>     x_from_R = r.rbeta(1, alpha, beta)
>>     print 'Alpha=%.2e; Beta=%.2e; scipy.stats.beta.rvs=%.2e;
>> R.rbeta=%.2e' % (alpha, beta, x_from_scipy, x_from_R[0])
>>     alpha /= 10.
>>     beta /= 10.
>>
>> and the output from it:
>> Alpha=7.10e-02; Beta=4.22e-01; scipy.stats.beta.rvs=2.75e-11;
>> R.rbeta=6.60e-02
>> Alpha=7.10e-03; Beta=4.22e-02; scipy.stats.beta.rvs=3.73e-84;
>> R.rbeta=4.50e-124
>> Alpha=7.10e-04; Beta=4.22e-03; scipy.stats.beta.rvs=1.00e+00;
>> R.rbeta=1.00e+00
>> Alpha=7.10e-05; Beta=4.22e-04; scipy.stats.beta.rvs=nan; R.rbeta=3.95e-313
>> Alpha=7.10e-06; Beta=4.22e-05; scipy.stats.beta.rvs=nan; R.rbeta=1.00e+00
>> Alpha=7.10e-07; Beta=4.22e-06; scipy.stats.beta.rvs=nan; R.rbeta=3.95e-315
>> Alpha=7.10e-08; Beta=4.22e-07; scipy.stats.beta.rvs=nan; R.rbeta=3.95e-316
>> Alpha=7.10e-09; Beta=4.22e-08; scipy.stats.beta.rvs=nan; R.rbeta=3.95e-317
>> Alpha=7.10e-10; Beta=4.22e-09; scipy.stats.beta.rvs=nan; R.rbeta=3.95e-318
>> Alpha=7.10e-11; Beta=4.22e-10; scipy.stats.beta.rvs=nan; R.rbeta=3.95e-319
>> Alpha=7.10e-12; Beta=4.22e-11; scipy.stats.beta.rvs=nan; R.rbeta=3.95e-320
>> Alpha=7.10e-13; Beta=4.22e-12; scipy.stats.beta.rvs=nan; R.rbeta=3.95e-321
>> Alpha=7.10e-14; Beta=4.22e-13; scipy.stats.beta.rvs=nan; R.rbeta=3.95e-322
>> Alpha=7.10e-15; Beta=4.22e-14; scipy.stats.beta.rvs=nan; R.rbeta=1.00e+00
>> Alpha=7.10e-16; Beta=4.22e-15; scipy.stats.beta.rvs=nan; R.rbeta=1.00e+00
>> Alpha=7.10e-17; Beta=4.22e-16; scipy.stats.beta.rvs=nan; R.rbeta=0.00e+00
>> Alpha=7.10e-18; Beta=4.22e-17; scipy.stats.beta.rvs=nan; R.rbeta=1.00e+00
>> Alpha=7.10e-19; Beta=4.22e-18; scipy.stats.beta.rvs=nan; R.rbeta=0.00e+00
>> Alpha=7.10e-20; Beta=4.22e-19; scipy.stats.beta.rvs=nan; R.rbeta=0.00e+00
>> Alpha=7.10e-21; Beta=4.22e-20; scipy.stats.beta.rvs=nan; R.rbeta=0.00e+00
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User@scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>


More information about the SciPy-User mailing list