[SciPy-dev] kstest is reporting wrong p-value ??

Robert Kern robert.kern@gmail....
Thu Nov 27 00:43:30 CST 2008


On Thu, Nov 27, 2008 at 00:02,  <josef.pktd@gmail.com> wrote:
> Looking again at ticket 395 about the Kolmogorov-Smirnov test, I'm
> quite sure the kstest is wrong.
>
> The current implementation uses absolute value of the deviation,
> therefore it is a two sided test. A one-sided test takes either max or
> min of the deviations (not of absolute deviations). However, the test
> distribution that is used to calculate the p-value is ksone, the
> distribution for the one-sided Kolmogorov-Smirnov test. So, the
> reported p-value should be off by approximately one half, or maybe
> double (?).

No, it's only slightly off (but you are correct that it is off). The
names "one-sided" and "two-sided" don't really correspond with the
usual meaning for generic hypothesis tests. Rather, they describe the
different statistics and their distributions. There are two different
kinds of "one-sided" K-S statistics, one that uses the greatest signed
difference between the ECDF and the CDF, and one that uses the
greatest signed difference between the CDF and the ECDF. Note the
orders. Both statistics are positive values, and both follow the same
"one-sided K-S distribution". The "two-sided K-S statistic" is the
maximum of both variants of the one-sided statistic. Its distribution
is close to the one-sided distribution, but is difficult to compute.
The K-S hypothesis test can be conducted with any of these, and can be
either one-sided (e.g. "is the fit poor?") or two-sided (e.g. "is the
fit either too poor or too good to be true?") in the conventional
sense hypothesis testing sense. kstest() implements a one-sided test
using the "one-sided K-S distribution" but incorrectly uses the
"two-sided K-S statistic".

Is that a clear explanation?

> There was a discussion in
> http://projects.scipy.org/pipermail/scipy-dev/2004-July/002181.html
> about this, but I'm not sure that conclusion is correct

You are correct. The terminology was tripping me up at the time, too.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


More information about the Scipy-dev mailing list