[Scipy-tickets] [SciPy] #869: stats.invnorm.cdf returns NaNs with low mu values
SciPy
scipy-tickets@scipy....
Mon Feb 16 00:06:20 CST 2009
#869: stats.invnorm.cdf returns NaNs with low mu values
-------------------------+--------------------------------------------------
Reporter: jpaalasm | Owner: somebody
Type: defect | Status: new
Priority: normal | Milestone: 0.8.0
Component: scipy.stats | Version: devel
Severity: normal | Resolution:
Keywords: |
-------------------------+--------------------------------------------------
Comment (by josefpktd):
The problem is that the invnorm._cdf has a term that is inf * 0.0 in this
case. I verified the expression for the cdf and the pdf with Johnson,
Kotz, Balakrishnan and they are correct. The standard use seems to be more
for large mu
{{{
x = np.arange(0.0015, 0.0025, 0.0001)
mu = 0.002
>>> np.exp(2.0/mu)*stats.norm.cdf(-fac*(xin+mu)/mu)
array([ NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN])
>>> np.exp(2.0/mu)
1.#INF
>>> stats.norm.cdf(-fac*(xin+mu)/mu)
array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
}}}
I don't see an obvious solution around this problem.
Note: numerical integration works, veccdf is the internal method to
calculate the cdf by integrating the pdf:
{{{
>>> stats.invnorm.veccdf(xin,0.002)
array([ 6.202e-11, 3.197e-07, 1.492e-04, 9.765e-03, 1.303e-01,
5.089e-01,
8.673e-01, 9.844e-01, 9.992e-01, 1.000e+00])
}}}
If I take only the first term of invnorm._cdf then it is relatively close
to the integrated pdf from veccdf
{{{
>>> stats.norm.cdf(fac*(xin-mu)/mu)
array([ 5.412e-11, 2.867e-07, 1.374e-04, 9.211e-03, 1.257e-01,
5.000e-01,
8.624e-01, 9.835e-01, 9.991e-01, 1.000e+00])
}}}
However, I don't know whether the inf*0.0 term actually goes to zero fast
enough for this to be a good approximation.
Also, as a consequence of the nans in the cdf for this case, ppf and sf,
isf, also will not work correctly.
As mu goes to zero, the distribution becomes degenerate, a spike at zero.
So, numerical problem will (always) remain for some small enough mu.
Scale is the second parameter for this distribution, but I haven't seen a
way how it would help in this case. According to JKB: if X is distributed
IG(mu,lambda) then a*X is distributed IG(a*mu, a*lambda).
So, unless someone finds a numerical solution to the inf*0.0 problem, the
only solution I see, is to switch to numerical integration of the pdf for
small mu.
But, is there really a use case for this? Or shall we just document that
although theoretically the distribution only requires mu>0, the numerical
calculation requires a larger mu.
--
Ticket URL: <http://scipy.org/scipy/scipy/ticket/869#comment:1>
SciPy <http://www.scipy.org/>
SciPy is open-source software for mathematics, science, and engineering.
More information about the Scipy-tickets
mailing list