[Scipy-tickets] [SciPy] #1527: stats.kruskal not working as advertised

SciPy Trac scipy-tickets@scipy....
Thu Oct 6 08:05:32 CDT 2011


#1527: stats.kruskal not working as advertised
-------------------------+--------------------------------------------------
 Reporter:  ckuster      |       Owner:  somebody    
     Type:  defect       |      Status:  needs_review
 Priority:  normal       |   Milestone:  Unscheduled 
Component:  scipy.stats  |     Version:  0.9.0       
 Keywords:               |  
-------------------------+--------------------------------------------------

Comment(by josefpktd):

 As a comment to the quality of asymptotic p-values in statistical tests
 like this:

 ckuster in https://github.com/scipy/scipy/pull/87#issuecomment-2309082
 (the following is copied):

 The p-values used in the scipy version of the non-parametric tests are
 calculated using functions that are reasonable approximations for large
 enough data sets.

 For very small data sets, the correct p-value is pretty easy to find.
 First, you calculate the statistic for the data you have, and then you
 look at every possible combination of ranks and count how many of those
 combinations have the same or more extreme statistic as the one for your
 data.

 {{{
 Example:

 data:   1,3,4/2,5       (statistic = 10/3)
 comb: 1,2,3/4,5       (statistic = 2)
             1,2,4/3,5       (statistic = 7/3)
             1,2,5/3,4       (statistic = 8/3)
             1,3,4/2,5       (statistic = 8/3)
             1,3,5/2,4       (statistic = 3)
             1,4,5/2,3       (statistic = 10/3)
             2,3,4/1,5       (statistic = 3)
             2,3,5/1,4       (statistic = 10/3)
             2,4,5/1,3       (statistic = 11/3)
             3,4,5/1,2       (statistic = 4)

 4 of the 10 possible rank combinations have a statistic of 10/3 or larger,
 so the p-value for the data is 0.4.
 }}}

 Looking at all combinations gets too expensive well before the function
 approximation is valid. In these in-between case you can approximate the
 p-value by repeatedly creating sets with the same number of data points as
 in your experiment but with randomly assigned ranks. For each of these
 random sets, you calculate the test statistic and compute the p-value as
 the fraction of your random sets with the same or more extreme statistic
 as the one for your data.

 The current implementation is fine for large non-normal data sets (which
 is one of the reasons to use non-parametric tests), but gives the wrong
 p-value for small data sets (the other reason).

 "Emulated p-value" is probably not the correct terminology, but this
 feature does exist in some statistics packages. I have used it in SPSS (or
 PASW or whatever they call it now).

 It is *very* slow, and will not return the same p-value every time
 (because of the randomness), but the p-value you obtain is more accurate
 than the result from the functional approximation.

-- 
Ticket URL: <http://projects.scipy.org/scipy/ticket/1527#comment:11>
SciPy <http://www.scipy.org>
SciPy is open-source software for mathematics, science, and engineering.


More information about the Scipy-tickets mailing list