[Scipy-tickets] [SciPy] #395: kstest computes incorrect value

SciPy scipy-tickets@scipy....
Sat Apr 7 04:29:06 CDT 2007

#395: kstest computes incorrect value
 Reporter:  peridot  |       Owner:  somebody
     Type:  defect   |      Status:  new     
 Priority:  normal   |   Milestone:          
Component:  Other    |     Version:          
 Severity:  normal   |    Keywords:          
 The D value computed by the Kolmogorov-Smirnov test, in the case of a
 sample {x_i} and a distribution having CDF F, should be max |S(x)-F(x)|,
 where S(x) is the sum of all x_i less than or equal to x. (According to
 Numerical Recipes in C.)

 The routine kstest computes this incorrectly (although code to do it
 correctly is commented out, and there was some discussion on the mailing
 list of this issue).

 Rephrasing without the absolute value, the goal is to compute max(
 max(S(x)-F(x)), max(F(x)-S(x)) )

 Since we have a finite sample, we can compute max(S(x)-F(x)) as
 max(S(x_i+1)-F(x_i)). This is correct because, since F is increasing, its
 lowest value always occurs at the left-hand side of the interval (x_i,
 x_i+1). This can be implemented as `amax((arange(N)+1.)/N-F(A))`.

 However, to compute max(F(x)-S(x)), we should take max(F(x_i)-S(x_i)),
 since the maximum value of F is now at the *right* side of the interval.
 The implementation of this is `amax(F(A)-arange(N)/float(N))` --- not
 equivalent to `amax(F(A)-(arange(N)+1.)/N)`.

 Thus simply taking `amax(absolute(F(A)-(arange(N)+1.)/N))`, as the code
 currently does, gives the wrong answer (which is why NR don't do it that
 way). Fortunately the fix is easy; the right code is there but commented

Ticket URL: <http://projects.scipy.org/scipy/scipy/ticket/395>
SciPy <http://www.scipy.org/>
SciPy is open-source software for mathematics, science, and engineering.

More information about the Scipy-tickets mailing list