[SciPy-dev] fix for ncx2 bug?
Wed Jun 17 13:22:54 CDT 2009
On Wed, Jun 17, 2009 at 1:53 PM, Neal Becker<email@example.com> wrote:
> firstname.lastname@example.org wrote:
>> On Wed, Jun 17, 2009 at 1:08 PM, Neal Becker<email@example.com> wrote:
>>> firstname.lastname@example.org wrote:
>>>> On Wed, Jun 17, 2009 at 10:45 AM, Neal Becker<email@example.com>
>>>>> As reported earlier, there is a bug in ncx2 that causes a strange
>>>>> Is there a patch available?
>>>> It's a ticket, but it could take some time until someone finds all the
>>>> imprecision for "outlier" cases in scipy.special.
>>>> I'm not sure these functions are even designed to have a high
>>>> precision, because for distributions such as chisquare or ncx2, that
>>>> are primarily used for test statistics, it make only sense to report a
>>>> few digits.
>>>> I would have recommended ncx2.veccdf for your case, but a quick check
>>>> showed that over a large range ncx2.veccdf and ncx2.cdf only agree up
>>>> to 1e-8. For ncx2.cdf close to one, there might also be a precision
>>>> What's your use case that you need ncx2 at high precision?
>>> I'm using this to compute what's often called "Receiver Operating
>>> Characteristic", which is a signal detection problem. You may see over
>>> some range that probability of miss or false detection is quite low.
>>> Probably you don't really care about accuracy here, but it's making the
>>> plots look bad.
>> If speed doesn't matter too much, you can still use veccdf, I don't
>> think it's more accurate but it is smooth, no discontinuities and it
>> also goes to 1 in the upper tail.
>> veccdf is a private method, that avoids the argument check, but as
>> long as you call it with valid arguments you get good solutions, and
>> it is vectorized (based on np.vectorize). The generic methods are
>> pretty heavily tested, so besides slower speed from the numerical
>> integration, I wouldn't expect any problems.
>> In contrast to central chisquare, I didn't see any formulas that would
>> allow a direct calculation of the ncx2.cdf as function of some other
>> special functions.
> I'm no expert on this. I do see 2 refs:
> Continuous Univariate Distributions, Vol 2, 2nd edition, Johnson, Kotz,
> Balakrishnan has an entire chapter on the subject.
> The code says:
> // Computes the complement of the Non-Central Chi-Square
> // Distribution CDF by summing a weighted sum of complements
> // of the central-distributions. The weighting factor is
> // a Poisson Distribution.
> // This is an application of the technique described in:
> // Computing discrete mixtures of continuous
> // distributions: noncentral chisquare, noncentral t
> // and the distribution of the square of the sample
> // multiple correlation coeficient.
> // D. Benton, K. Krishnamoorthy.
> // Computational Statistics & Data Analysis 43 (2003) 249 - 267
> I see that the 1st ref above also talks about the fact that cdf on ncx2 can
> be obtained as weighted sum of central distributions, but it also goes on to
> give other methods of calculation.
The problem is that the sum has an infinite number of terms. I looked
at JKB vol.2 but they only discuss the central chisquare distribution
extensively, but I didn't see much for non-central chi-square.
Wikipidia as the infinite series expression for the ncx2.cdf
But I'm not a numerical precision person, and I usually don't worry
about numerical, floating point precision of convergence of a series
with an infinite number of terms. My preferred bugfix in this would be
special.hyp2f1. But, I don't have the background and I prefer to
chase the "big bugs", 1e10 and 1e1 instead of 1e-8 and 1e-15.
The infinite mixture might be relatively easy to calculate for
standard cases, but it might be difficult to maintain a high precision
when users start to plug in "funny" parameters and/or expect it to
work for every tail case.
More information about the Scipy-dev