[SciPy-dev] kstest is reporting wrong p-value ??

josef.pktd@gmai... josef.pktd@gmai...
Thu Nov 27 11:30:24 CST 2008


I compared with R in more detail:

conclusion for small samples:

* stats.kstest() for less than 10 observation is pretty wrong
* calculation of D differs quite a bit from R and matlab (those 2 give
the same numbers)
* exact method in R uses the same distribution as
stats.ksone.sf(D,n)*2 up to 4 decimals   ! Note: times 2
* asymptotic distribution in R (not using exact) is exactly the same
as kstwobign.(D*sqrt(n)) up to more than 7 decimals

For larger samples, I tried 100 normal distributed random variables
stats.kstest() still gives the wrong D and pval, but the difference is
not as large as in small samples.

With a sample of 1000 normal rvs, the D of stats.kstest() and of R are
essentially identical, but the pvalue reported by stats.kstest() is
half of the one in R

>>> xxrl = stats.norm.rvs(size=1000)
>>> resultrl=ksfn(xxrl,'pnorm', exact = True)     #this is R's kstest through rpy
>>> resultrl['p.value']
0.2419499342788699
>>> resultrl['statistic']['D']
0.032317405617139472
>>> stats.kstest(xxrl,'norm')
(0.032317405617139472, 0.12118954799968018)


So, stats.kstest() definitely needs to be fixed.

Josef


More information about the Scipy-dev mailing list