[SciPy-dev] stats - kstest
Travis Oliphant
oliphant at ee.byu.edu
Mon Jul 19 11:25:32 CDT 2004
Robert Kern wrote:
> Manuel Metz wrote:
>
>> Hi,
>> hopefully I'm at the right place to manifest my suggestion.
>>
>> As far as I understand the "kstest" from the book "Numerical recipes
>> in C++" (Chapt. 14.3, Kolmogorov-Smirnov Test) the kstest algorithm
>> is not correctly implementet in SciPy. (or NR ?) I think the error is
>> in the second last line of kstest():
>>
>> >>> D = max(abs(cdfvals - sb.arange(1.0,N+1)/N))
>>
>> In comparison from NR:
>>
>> >>> double en = data.size()
>> >>> for( j=0; j<n; j++) {
>> >>> fn = (j+1)/en;
>> >>> ff = func( data[j] );
>> >>> dt = max( fabs(fo-ff), fabs(fn-ff)
>> >>> if (dt > d) d=dt;
>> >>> fo = fn;
>> >>> }
>>
>> So the main difference is, that in the NR algorithm the "D" is
>> calculated as the maximum distance D = max |S_N(x) - P(x)| by
>> calculating the distances to the upper AND the lower side of P(X) to
>> the step function S_N(x), while in the SciPy routine only the
>> distance to the upper side is calculated.
>>
>> Is my suggestion right, that the error is in the SciPy algorithm? If
>> yes, could anyone correct it with the next release of SciPy?
>
>
> Yes, I believe you are correct.
>
I reviewed what was done again and now believe we were correct. The
distribution that is being used in kstest is the Kolmogorov one-sided
distribution, KS+ Because this is the distribution used, the test is
done with a one-sided statistic.
SciPy only has an approximate two-sided statistic which is valid for
large N. We do not have it wrapped in a kstest-like command, but the
distribution is available as kstwobign.
We could modify kstest or make a new command for the two-sided test.
Questions and/or comments welcome.
-Travis O.
More information about the Scipy-dev
mailing list