[Scipy-tickets] [SciPy] #395: kstest computes incorrect value
SciPy
scipy-tickets@scipy....
Sat Apr 7 04:29:06 CDT 2007
#395: kstest computes incorrect value
---------------------+------------------------------------------------------
Reporter: peridot | Owner: somebody
Type: defect | Status: new
Priority: normal | Milestone:
Component: Other | Version:
Severity: normal | Keywords:
---------------------+------------------------------------------------------
The D value computed by the Kolmogorov-Smirnov test, in the case of a
sample {x_i} and a distribution having CDF F, should be max |S(x)-F(x)|,
where S(x) is the sum of all x_i less than or equal to x. (According to
Numerical Recipes in C.)
The routine kstest computes this incorrectly (although code to do it
correctly is commented out, and there was some discussion on the mailing
list of this issue).
Rephrasing without the absolute value, the goal is to compute max(
max(S(x)-F(x)), max(F(x)-S(x)) )
Since we have a finite sample, we can compute max(S(x)-F(x)) as
max(S(x_i+1)-F(x_i)). This is correct because, since F is increasing, its
lowest value always occurs at the left-hand side of the interval (x_i,
x_i+1). This can be implemented as `amax((arange(N)+1.)/N-F(A))`.
However, to compute max(F(x)-S(x)), we should take max(F(x_i)-S(x_i)),
since the maximum value of F is now at the *right* side of the interval.
The implementation of this is `amax(F(A)-arange(N)/float(N))` --- not
equivalent to `amax(F(A)-(arange(N)+1.)/N)`.
Thus simply taking `amax(absolute(F(A)-(arange(N)+1.)/N))`, as the code
currently does, gives the wrong answer (which is why NR don't do it that
way). Fortunately the fix is easy; the right code is there but commented
out.
--
Ticket URL: <http://projects.scipy.org/scipy/scipy/ticket/395>
SciPy <http://www.scipy.org/>
SciPy is open-source software for mathematics, science, and engineering.
More information about the Scipy-tickets
mailing list