[Scipy-tickets] [SciPy] #1744: Wilcoxon signed rank test validation on textbook example fails

SciPy Trac scipy-tickets@scipy....
Fri Oct 12 05:07:14 CDT 2012


#1744: Wilcoxon signed rank test validation on textbook example fails
-------------------------+--------------------------------------------------
 Reporter:  chris1869    |       Owner:  somebody         
     Type:  defect       |      Status:  needs_review     
 Priority:  normal       |   Milestone:  StatisticsCleanup
Component:  scipy.stats  |     Version:  0.11.0           
 Keywords:  malfunction  |  
-------------------------+--------------------------------------------------

Comment(by chris1869):

 No, the textbook does not give a reference. I assume that this is a
 procedure they promote. I checked the textbook example with the mass
 library for R.


 {{{
 > library(MASS)
 > x =
 c(-4,-3,-3,-3,-3,-2,-2,-2,-2,-2,-2,-2,-2,-2,-2,-2,-2,-2,-2,-2,-2,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,0,0,0,0,0,0,0,1,1,1,1,2,2,2,2,2,3,4,4)
 > y = array(0, c(56))
 > wilcox.test(x, y, paired=TRUE)

         Wilcoxon signed rank test with continuity correction

 data:  x and y
 V = 327, p-value = 0.006516
 alternative hypothesis: true location shift is not equal to 0

 Warnmeldungen:
 1: In wilcox.test.default(x, y, paired = TRUE) :
   kann bei Bindungen keinen exakten p-Wert Berechnen
 2: In wilcox.test.default(x, y, paired = TRUE) :
   kann den exakten p-Wert bei Nullen nicht berechnen
 }}}

 Compared to the scipy output:
 (327, nan)
 Textbook methdod:
 (441, 0.003258)

 The problem in morestats.py is caused by lines 1288, 1289, if they are
 exchanged as shown below, the routine yields the same results as the
 R-procedure.

 {{{
 <-- se = math.sqrt(count*(count+1)*(2*count+1.0)/24)
 --> se = count*(count+1)*(2*count+1.0)
     if (len(r) != len(unique(r))): # handle ties in data
         replist, repnum = find_repeats(r)
         corr = 0.0
         for i in range(len(replist)):
             si = repnum[i]
             corr += 0.5*si*(si*si-1.0)
 <--     V = se*se - corr
 <--     se = sqrt((count*V - T*T)/(count-1.0))
 -->     se -= corr

 <-- z = (T - mn)/se
 --> se = sqrt(se / 24.)
 }}}

 Now the difference to the textbook is simply due to including the zero
 differences in the r_plus and r_minus sum. This raises T from 327 to 441.

-- 
Ticket URL: <http://projects.scipy.org/scipy/ticket/1744#comment:4>
SciPy <http://www.scipy.org>
SciPy is open-source software for mathematics, science, and engineering.


More information about the Scipy-tickets mailing list