[SciPy-User] scipy.stats.beta does not handle small parameters

josef.pktd@gmai... josef.pktd@gmai...
Tue Apr 20 08:52:02 CDT 2010

On Mon, Apr 19, 2010 at 9:49 AM, John Reid <j.reid@mail.cryst.bbk.ac.uk> wrote:
> I've been using scipy.stats.beta and I've noticed it returns NaN when
> the parameters are small. I've tried using the same parameters in R code
> and it handles them just fine. R also handles parameters up to 17 orders
> of magnitude smaller. Is there any documentation on which parameter
> ranges are acceptable? Can I expect similar results with other
> distributions? Should I file a bug report?

General answer:
Many distributions have numerical problems in the boundary range of
the valid distribution parameters, and there are no systematic tests
for these. In the last two years, we improved the numerical precision
in many of these cases. However, in other cases, working around
numerical precision problems is too much effort, too difficult or
would require too many special cases, so it's not really worth it for
the "common" usage.
Filing a bug report is always good, (and even better with a patch), in
the worst case it adds to documenting the limitation of the current

In the beta.rvs case the relevant code is in numpy.random
    def _rvs(self, a, b):
        return mtrand.beta(a,b,self._size)

which means digging through the pyrex or c code to find the source of
the nans. Most of the other code in distributions.beta is from
scipy.special (either c or fortran code).

I just tried your first example (the following is also intended as
documentation for myself)

>>> alpha, beta
(7.1e-008, 4.2199999999999999e-007)

This is very close to a two point distribution, cdf in scipy and R aggree
>>> stats.beta.cdf(1e-8, alpha, beta)
>>> r.pbeta(1e-8, alpha, beta)[0]
>>> stats.beta.sf(1-1e-8, alpha, beta)
>>> 1-r.pbeta(1-1e-8, alpha, beta)[0]
>>> stats.beta.cdf(1e-8, alpha, beta) + stats.beta.sf(1-1e-8, alpha, beta)

quantile/ppf: R and scipy disagree on what almost zero

>>> r.qbeta(0.5, alpha, beta)[0]
>>> stats.beta.ppf(0.5, alpha, beta)
>>> r.qbeta(0.85, alpha, beta)[0]
>>> stats.beta.ppf(0.85, alpha, beta)

pdf looks ok
>>> r.dbeta(1e-8, alpha, beta)[0]
>>> stats.beta.pdf(1e-8, alpha, beta)
>>> r.dbeta(1-1e-12, alpha, beta)[0]
>>> stats.beta.pdf(1-1e-12, alpha, beta)

ppf cdf roundtripping looks ok in one direction

>>> stats.beta.ppf(stats.beta.cdf(0.5, alpha, beta), alpha, beta)
>>> r.qbeta(r.pbeta(0.5, alpha, beta)[0], alpha, beta)[0]

roundtripping doesn't work in R nor in scipy in the other direction
>>> r.qbeta(r.qbeta(0.5, alpha, beta)[0], alpha, beta)[0]
>>> stats.beta.cdf(stats.beta.ppf(0.5, alpha, beta), alpha, beta)

#check roundtripping formula is correct
>>> stats.norm.cdf(stats.norm.ppf(0.5, alpha, beta), alpha, beta)
>>> stats.norm.cdf(stats.norm.ppf(0.25, alpha, beta), alpha, beta)

rvs in R looks ok
>>> rrvs = r.rbeta(1000, alpha, beta)
>>> rrvsp = [rrvs[i] for i in range(1000)]
>>> (np.array(rrvsp)<0.5).mean()
>>> (np.array(rrvsp)>0.5).mean()

but not in numpy
>>> np.random.beta(alpha, beta, size=4)
array([ NaN,  NaN,  NaN,  NaN])

So it might be worth a look into numpy.random.beta to see what causes the nan.
beta.ppf and qbeta both have problems in the neighborhood of zero (and one ?)


> Here's the code I used:
> import scipy.stats as S, numpy as N
> from rpy2.robjects import r
> alpha, beta = 0.0710, 0.4222
> for i in xrange(20):
>     x_from_scipy = S.beta.rvs(alpha, beta)
>     x_from_R = r.rbeta(1, alpha, beta)
>     print 'Alpha=%.2e; Beta=%.2e; scipy.stats.beta.rvs=%.2e;
> R.rbeta=%.2e' % (alpha, beta, x_from_scipy, x_from_R[0])
>     alpha /= 10.
>     beta /= 10.
> and the output from it:
> Alpha=7.10e-02; Beta=4.22e-01; scipy.stats.beta.rvs=2.75e-11;
> R.rbeta=6.60e-02
> Alpha=7.10e-03; Beta=4.22e-02; scipy.stats.beta.rvs=3.73e-84;
> R.rbeta=4.50e-124
> Alpha=7.10e-04; Beta=4.22e-03; scipy.stats.beta.rvs=1.00e+00;
> R.rbeta=1.00e+00
> Alpha=7.10e-05; Beta=4.22e-04; scipy.stats.beta.rvs=nan; R.rbeta=3.95e-313
> Alpha=7.10e-06; Beta=4.22e-05; scipy.stats.beta.rvs=nan; R.rbeta=1.00e+00
> Alpha=7.10e-07; Beta=4.22e-06; scipy.stats.beta.rvs=nan; R.rbeta=3.95e-315
> Alpha=7.10e-08; Beta=4.22e-07; scipy.stats.beta.rvs=nan; R.rbeta=3.95e-316
> Alpha=7.10e-09; Beta=4.22e-08; scipy.stats.beta.rvs=nan; R.rbeta=3.95e-317
> Alpha=7.10e-10; Beta=4.22e-09; scipy.stats.beta.rvs=nan; R.rbeta=3.95e-318
> Alpha=7.10e-11; Beta=4.22e-10; scipy.stats.beta.rvs=nan; R.rbeta=3.95e-319
> Alpha=7.10e-12; Beta=4.22e-11; scipy.stats.beta.rvs=nan; R.rbeta=3.95e-320
> Alpha=7.10e-13; Beta=4.22e-12; scipy.stats.beta.rvs=nan; R.rbeta=3.95e-321
> Alpha=7.10e-14; Beta=4.22e-13; scipy.stats.beta.rvs=nan; R.rbeta=3.95e-322
> Alpha=7.10e-15; Beta=4.22e-14; scipy.stats.beta.rvs=nan; R.rbeta=1.00e+00
> Alpha=7.10e-16; Beta=4.22e-15; scipy.stats.beta.rvs=nan; R.rbeta=1.00e+00
> Alpha=7.10e-17; Beta=4.22e-16; scipy.stats.beta.rvs=nan; R.rbeta=0.00e+00
> Alpha=7.10e-18; Beta=4.22e-17; scipy.stats.beta.rvs=nan; R.rbeta=1.00e+00
> Alpha=7.10e-19; Beta=4.22e-18; scipy.stats.beta.rvs=nan; R.rbeta=0.00e+00
> Alpha=7.10e-20; Beta=4.22e-19; scipy.stats.beta.rvs=nan; R.rbeta=0.00e+00
> Alpha=7.10e-21; Beta=4.22e-20; scipy.stats.beta.rvs=nan; R.rbeta=0.00e+00
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user

More information about the SciPy-User mailing list