[SciPy-dev] fix for ncx2 bug?

Robert Kern robert.kern@gmail....
Wed Jun 17 16:12:41 CDT 2009


On Wed, Jun 17, 2009 at 14:00, Neal Becker<ndbecker2@gmail.com> wrote:
> josef.pktd@gmail.com wrote:
>
>> On Wed, Jun 17, 2009 at 1:53 PM, Neal Becker<ndbecker2@gmail.com> wrote:
>>> josef.pktd@gmail.com wrote:
>>>
>>>> On Wed, Jun 17, 2009 at 1:08 PM, Neal Becker<ndbecker2@gmail.com> wrote:
>>>>> josef.pktd@gmail.com wrote:
>>>>>
>>>>>> On Wed, Jun 17, 2009 at 10:45 AM, Neal Becker<ndbecker2@gmail.com>
>>>>>> wrote:
>>>>>>> As reported earlier, there is a bug in ncx2 that causes a strange
>>>>>>> discontinuity.
>>>>>>>
>>>>>>> Is there a patch available?
>>>>>>>
>>>>>>
>>>>>> It's a ticket, but it could take some time until someone finds all the
>>>>>> imprecision for "outlier" cases in scipy.special.
>>>>>>
>>>>>> I'm not sure these functions are even designed to have a high
>>>>>> precision, because for distributions such as chisquare or ncx2, that
>>>>>> are primarily used for test statistics, it make only sense to report a
>>>>>> few digits.
>>>>>>
>>>>>> I would have recommended ncx2.veccdf for your case, but a quick check
>>>>>> showed that over a large range ncx2.veccdf and ncx2.cdf only agree up
>>>>>> to 1e-8. For ncx2.cdf close to one, there might also be a precision
>>>>>> loss.
>>>>>>
>>>>>> What's your use case that you need ncx2 at high precision?
>>>>>>
>>>>>> Josef
>>>>>
>>>>> I'm using this to compute what's often called "Receiver Operating
>>>>> Characteristic", which is a signal detection problem.  You may see over
>>>>> some range that probability of miss or false detection is quite low.
>>>>> Probably you don't really care about accuracy here, but it's making the
>>>>> plots look bad.
>>>>
>>>> If speed doesn't matter too much, you can still use veccdf, I don't
>>>> think it's more accurate but it is smooth, no discontinuities and it
>>>> also goes to 1 in the upper tail.
>>>>
>>>> veccdf is a private method, that avoids the argument check, but as
>>>> long as you call it with valid arguments you get good solutions, and
>>>> it is vectorized (based on np.vectorize). The generic methods are
>>>> pretty heavily tested, so besides slower speed from the numerical
>>>> integration, I wouldn't expect any problems.
>>>>
>>>> In contrast to central chisquare, I didn't see any formulas that would
>>>> allow a direct calculation of the ncx2.cdf as function of some other
>>>> special functions.
>>>>
>>>> Josef
>>>
>>> I'm no expert on this.  I do see 2 refs:
>>>
>>> Continuous Univariate Distributions, Vol 2, 2nd edition, Johnson, Kotz,
>>> Balakrishnan has an entire chapter on the subject.
>>>
>>>
> http://www.boost.org/doc/libs/1_39_0/libs/math/doc/sf_and_dist/html/math_toolkit/dist/dist_ref/dists/nc_chi_squared_dist.html
>>>
>>> The code says:
>>> //
>>> // Computes the complement of the Non-Central Chi-Square
>>> // Distribution CDF by summing a weighted sum of complements
>>> // of the central-distributions.  The weighting factor is
>>> // a Poisson Distribution.
>>> //
>>> // This is an application of the technique described in:
>>> //
>>> // Computing discrete mixtures of continuous
>>> // distributions: noncentral chisquare, noncentral t
>>> // and the distribution of the square of the sample
>>> // multiple correlation coeficient.
>>> // D. Benton, K. Krishnamoorthy.
>>> // Computational Statistics & Data Analysis 43 (2003) 249 - 267
>>>
>>>
>>> I see that the 1st ref above also talks about the fact that cdf on ncx2
>>> can be obtained as weighted sum of central distributions, but it also
>>> goes on to give other methods of calculation.
>>>
>>
>> The problem is that the sum has an infinite number of terms. I looked
>> at JKB vol.2 but they only discuss the central chisquare distribution
>> extensively, but I didn't see much for non-central chi-square.
>>
>> Wikipidia as the infinite series expression for the ncx2.cdf
>>  http://en.wikipedia.org/wiki/Noncentral_chi-square_distribution
>>
>>
>> But I'm not a numerical precision person, and I usually don't worry
>> about numerical, floating point precision of convergence of a series
>> with an infinite number of terms. My preferred bugfix in this would be
>> special.hyp2f1.  But, I don't have the background and I prefer to
>> chase the "big bugs", 1e10 and 1e1 instead of 1e-8 and 1e-15.
>>
>> The infinite mixture might be relatively easy to calculate for
>> standard cases, but it might be difficult to maintain a high precision
>> when users start to plug in "funny" parameters and/or expect it to
>> work for every tail case.
>>
>> Josef
>
> How about the boost reference?  This is an existence proof of code that is
> supposed to be good quality.  Refer to non_central_chi_squared_cdf in
> boost/math/distributions/non_central_chi_squared.hpp.

The code we have is also supposed to be good quality. Does the boost
code claim that it will be good in the particular domain of your
interest? Even good quality numerical software sometimes have gaps,
particularly way out in the tails of CDFs.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


More information about the Scipy-dev mailing list