[SciPy-dev] Inclusion of Kuiper test in Scipy
Travis Oliphant
oliphant@enthought....
Wed Nov 18 22:45:16 CST 2009
On Nov 2, 2009, at 8:50 AM, Anne Archibald wrote:
> Hi,
>
> I have implemented a statistical test from the literature, the Kuiper
> test, for my own work, but I think it might be worth including it in
> Scipy itself. I'd like to hear other people's opinions, though, both
> on what (if anything) should go into scipy, and on whether it needs
> modification. The code is at:
>
> http://github.com/aarchiba/kuiper
>
> This code includes a number of things beyond the basic test, some or
> all of which may not be worth including in Scipy. What's there:
>
> The Kuiper test - analogous to the Kolmogorov-Smirnov test, this takes
> either a sample and a callable CDF or two samples and returns an
> abstract score and the probability that a score that large would have
> arisen if the two arguments are from the same distribution. This test
> is sensitive to somewhat different features of the distribution than
> the K-S test, and, importantly, it is invariant under cyclic
> permutation: that is, if all the samples and distribution are modulo
> (say) 1, then any shift in both arguments leaves the value unaffected.
> Thus it is well suited to periodic distributions.
>
> The Z_m^2 test - a test for uniformity on [0,1) based on the first m
> Fourier coefficients. Returns a score and the probability of a score
> that large.
>
> The H test - a test that uses a data-dependent number of harmonics to
> test for uniformity. Returns the score and the probability, and also
> the number of harmonics that gave the most significant detection.
>
> fold_intervals - a function to take a series of weighted intervals and
> return the total exposure of each phase modulo 1. For testing for
> uniformity when you have more data from some phases than others.
> cdf_from_intervals - a function to construct a piecewise-linear CDF
> from a set of exposures (as returned by the above function).
> histogram_intervals - A function to evaluate how much exposure each
> histogram bin received, to allow testing for uniformity using a
> histogram in the presence of non-uniform exposure.
>
> There are also a couple of handy decorators in the test suite:
>
> seed - set the random seed before running a test
> double_check - for randomized tests: run once, and if it fails, run
> it again.
>
> All have tests and somewhat informative docstrings, but I suspect some
> of them may be too specialized to be of much use. The Kuiper test
> should have wide applicability; the Z_m^2 test and H test, not so
> much, although they are handy when testinf gor periodicity. The last
> batch of utility functions I'm not sure are general enough to be very
> useful, but I needed them.
>
> What do you think? How much of this would be useful in Scipy?
I'm sure the Kuiper test is of interest and perhaps some of the others
as well. I was hoping Josef or Robert would chime in. I would be
interested in these additions, though I don't have time to review
them. If you could post them as a patch on the tracker that would
be great.
-Travis
More information about the Scipy-dev
mailing list