[SciPy-dev] Inclusion of Kuiper test in Scipy
Anne Archibald
aarchiba@physics.mcgill...
Mon Nov 2 08:50:34 CST 2009
Hi,
I have implemented a statistical test from the literature, the Kuiper
test, for my own work, but I think it might be worth including it in
Scipy itself. I'd like to hear other people's opinions, though, both
on what (if anything) should go into scipy, and on whether it needs
modification. The code is at:
http://github.com/aarchiba/kuiper
This code includes a number of things beyond the basic test, some or
all of which may not be worth including in Scipy. What's there:
The Kuiper test - analogous to the Kolmogorov-Smirnov test, this takes
either a sample and a callable CDF or two samples and returns an
abstract score and the probability that a score that large would have
arisen if the two arguments are from the same distribution. This test
is sensitive to somewhat different features of the distribution than
the K-S test, and, importantly, it is invariant under cyclic
permutation: that is, if all the samples and distribution are modulo
(say) 1, then any shift in both arguments leaves the value unaffected.
Thus it is well suited to periodic distributions.
The Z_m^2 test - a test for uniformity on [0,1) based on the first m
Fourier coefficients. Returns a score and the probability of a score
that large.
The H test - a test that uses a data-dependent number of harmonics to
test for uniformity. Returns the score and the probability, and also
the number of harmonics that gave the most significant detection.
fold_intervals - a function to take a series of weighted intervals and
return the total exposure of each phase modulo 1. For testing for
uniformity when you have more data from some phases than others.
cdf_from_intervals - a function to construct a piecewise-linear CDF
from a set of exposures (as returned by the above function).
histogram_intervals - A function to evaluate how much exposure each
histogram bin received, to allow testing for uniformity using a
histogram in the presence of non-uniform exposure.
There are also a couple of handy decorators in the test suite:
seed - set the random seed before running a test
double_check - for randomized tests: run once, and if it fails, run it again.
All have tests and somewhat informative docstrings, but I suspect some
of them may be too specialized to be of much use. The Kuiper test
should have wide applicability; the Z_m^2 test and H test, not so
much, although they are handy when testinf gor periodicity. The last
batch of utility functions I'm not sure are general enough to be very
useful, but I needed them.
What do you think? How much of this would be useful in Scipy?
Thanks,
Anne
More information about the Scipy-dev
mailing list