[SciPy-user] new Kolmogorov-Smirnov test

josef.pktd@gmai... josef.pktd@gmai...
Wed Dec 3 14:15:18 CST 2008


On Wed, Dec 3, 2008 at 2:49 PM, Jarrod Millman <millman@berkeley.edu> wrote:
> On Wed, Dec 3, 2008 at 11:43 AM, Matthew Brett <matthew.brett@gmail.com> wrote:
>>> def ks_2samp(data1, data2):
>>>    """ Computes the Kolmogorov-Smirnof statistic on 2 samples.  Modified
>>>    from Numerical Recipies in C, page 493.  Returns KS D-value, prob.  Not
>>>    ufunc- like.
>>
>> Wait - really?  We can't use Numerical Recipes code, it has strict and
>> incompatible licensing...  If it's in there it really has to come out
>> as fast as possible.
>
> http://www.nr.com/licenses/redistribute.html
>
> --
> Jarrod Millman
> Computational Infrastructure for Research Labs
> 10 Giannini Hall, UC Berkeley
> phone: 510.643.4014
> http://cirl.berkeley.edu/
> _______________________________________________
> SciPy-user mailing list
> SciPy-user@scipy.org
> http://projects.scipy.org/mailman/listinfo/scipy-user
>

The algorithm is essentially one loop to calculate the distance
measure, I would assume that this simple algorithm cannot be copyright
protected, but for efficiency, it might be better anyway to come up
with a vectorized version similar to kstest.


about correctness:
=============

A quick Monte Carlo shows that the test is pretty accurate under the
null even for small sample sizes, power to reject, if the alternative
is true is only reasonably high in larger samples


Null correct
==================================================
Monte Carlo for K-S 2sample test (ks_2samp):
sample size = 100, 1000 replications
sample 1: normal distribution (loc=1.000000,scale=2.000000)
sample 2: normal distribution (loc=1.000000,scale=2.000000)
ks_2samp: proportion of rejection at 1% significance: 0.003
ks_2samp: proportion of rejection at 5% significance: 0.049
ks_2samp: proportion of rejection at 10% significance: 0.101

=========
Null not true:
==================================================
Monte Carlo for K-S 2sample test (ks_2samp):
sample size = 500, 1000 replications
sample 1: normal distribution (loc=0.000000,scale=1.000000)
sample 2: t distribution (dof=10, loc=0.000000,scale=1.000000)
ks_2samp: proportion of rejection at 1% significance: 0.253
ks_2samp: proportion of rejection at 5% significance: 0.71
ks_2samp: proportion of rejection at 10% significance: 0.88

Josef
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ks2_samp_MCtest.py
Url: http://projects.scipy.org/pipermail/scipy-user/attachments/20081203/6359e41d/attachment.pl 


More information about the SciPy-user mailing list