# [Scipy-tickets] [SciPy] #1527: stats.kruskal not working as advertised

SciPy Trac scipy-tickets@scipy....
Thu Oct 6 08:05:32 CDT 2011

```#1527: stats.kruskal not working as advertised
-------------------------+--------------------------------------------------
Reporter:  ckuster      |       Owner:  somebody
Type:  defect       |      Status:  needs_review
Priority:  normal       |   Milestone:  Unscheduled
Component:  scipy.stats  |     Version:  0.9.0
Keywords:               |
-------------------------+--------------------------------------------------

Comment(by josefpktd):

As a comment to the quality of asymptotic p-values in statistical tests
like this:

ckuster in https://github.com/scipy/scipy/pull/87#issuecomment-2309082
(the following is copied):

The p-values used in the scipy version of the non-parametric tests are
calculated using functions that are reasonable approximations for large
enough data sets.

For very small data sets, the correct p-value is pretty easy to find.
First, you calculate the statistic for the data you have, and then you
look at every possible combination of ranks and count how many of those
combinations have the same or more extreme statistic as the one for your
data.

{{{
Example:

data:   1,3,4/2,5       (statistic = 10/3)
comb: 1,2,3/4,5       (statistic = 2)
1,2,4/3,5       (statistic = 7/3)
1,2,5/3,4       (statistic = 8/3)
1,3,4/2,5       (statistic = 8/3)
1,3,5/2,4       (statistic = 3)
1,4,5/2,3       (statistic = 10/3)
2,3,4/1,5       (statistic = 3)
2,3,5/1,4       (statistic = 10/3)
2,4,5/1,3       (statistic = 11/3)
3,4,5/1,2       (statistic = 4)

4 of the 10 possible rank combinations have a statistic of 10/3 or larger,
so the p-value for the data is 0.4.
}}}

Looking at all combinations gets too expensive well before the function
approximation is valid. In these in-between case you can approximate the
p-value by repeatedly creating sets with the same number of data points as
in your experiment but with randomly assigned ranks. For each of these
random sets, you calculate the test statistic and compute the p-value as
the fraction of your random sets with the same or more extreme statistic
as the one for your data.

The current implementation is fine for large non-normal data sets (which
is one of the reasons to use non-parametric tests), but gives the wrong
p-value for small data sets (the other reason).

"Emulated p-value" is probably not the correct terminology, but this
feature does exist in some statistics packages. I have used it in SPSS (or
PASW or whatever they call it now).

It is *very* slow, and will not return the same p-value every time
(because of the randomness), but the p-value you obtain is more accurate
than the result from the functional approximation.

--
Ticket URL: <http://projects.scipy.org/scipy/ticket/1527#comment:11>
SciPy <http://www.scipy.org>
SciPy is open-source software for mathematics, science, and engineering.
```