[Scipy-tickets] [SciPy] #869: stats.invnorm.cdf returns NaNs with low mu values

SciPy scipy-tickets@scipy....
Mon Feb 16 00:06:20 CST 2009


#869: stats.invnorm.cdf returns NaNs with low mu values
-------------------------+--------------------------------------------------
 Reporter:  jpaalasm     |        Owner:  somebody
     Type:  defect       |       Status:  new     
 Priority:  normal       |    Milestone:  0.8.0   
Component:  scipy.stats  |      Version:  devel   
 Severity:  normal       |   Resolution:          
 Keywords:               |  
-------------------------+--------------------------------------------------
Comment (by josefpktd):

 The problem is that the invnorm._cdf has a term that is inf * 0.0 in this
 case. I verified the expression for the cdf and the pdf with Johnson,
 Kotz, Balakrishnan and they are correct. The standard use seems to be more
 for large mu

 {{{
 x = np.arange(0.0015, 0.0025, 0.0001)
 mu = 0.002

 >>> np.exp(2.0/mu)*stats.norm.cdf(-fac*(xin+mu)/mu)
 array([ NaN,  NaN,  NaN,  NaN,  NaN,  NaN,  NaN,  NaN,  NaN,  NaN])
 >>> np.exp(2.0/mu)
 1.#INF
 >>> stats.norm.cdf(-fac*(xin+mu)/mu)
 array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])
 }}}

 I don't see an obvious solution around this problem.

 Note: numerical integration works, veccdf is the internal method to
 calculate the cdf by integrating the pdf:

 {{{
 >>> stats.invnorm.veccdf(xin,0.002)
 array([  6.202e-11,   3.197e-07,   1.492e-04,   9.765e-03,   1.303e-01,
 5.089e-01,
          8.673e-01,   9.844e-01,   9.992e-01,   1.000e+00])
 }}}
 If I take only the first term of invnorm._cdf then it is relatively close
 to the integrated pdf from veccdf

 {{{
 >>> stats.norm.cdf(fac*(xin-mu)/mu)
 array([  5.412e-11,   2.867e-07,   1.374e-04,   9.211e-03,   1.257e-01,
 5.000e-01,
          8.624e-01,   9.835e-01,   9.991e-01,   1.000e+00])
 }}}

 However, I don't know whether the inf*0.0 term actually goes to zero fast
 enough for this to be a good approximation.

 Also, as a consequence of the nans in the cdf for this case, ppf and sf,
 isf, also will not work correctly.

 As mu goes to zero, the distribution becomes degenerate, a spike at zero.
 So, numerical problem will (always) remain for some small enough mu.

 Scale is the second parameter for this distribution, but I haven't seen a
 way how it would help in this case. According to JKB: if X is distributed
 IG(mu,lambda) then a*X is distributed IG(a*mu, a*lambda).

 So, unless someone finds a numerical solution to the inf*0.0 problem, the
 only solution I see, is to switch to numerical integration of the pdf for
 small mu.

 But, is there really a use case for this? Or shall we just document that
 although theoretically the distribution only requires mu>0, the numerical
 calculation requires a larger mu.

-- 
Ticket URL: <http://scipy.org/scipy/scipy/ticket/869#comment:1>
SciPy <http://www.scipy.org/>
SciPy is open-source software for mathematics, science, and engineering.


More information about the Scipy-tickets mailing list