[SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau)
Sturla Molden
sturla@molden...
Tue Mar 17 16:09:55 CDT 2009
> Let's use two identical rankings with a tie:
> A B C
> R1 = [1, 1, 2]
> R2 = [1, 1, 2]
Minitab says Kendall's tau is 0.67 in this case.
When looking at page 752 in Numerical Receipes, 3rd edition:
tau = (c - d)/(sqrt(c+d+ey)*sqrt(c+d+ex))
c = #concordant pairs
d = #disconcordant pairs
ey = #pairs with tied rank in y but not in x
ex = #pairs with tied tank in x but not in y
Here the pairs are:
(1,1) vs. (1,1) -> tie in x and tie in y
(1,1) vs. (2,2) -> concordant pair
(1,1) vs. (2,2) -> concordant pair
tau = (2 - 0) / (sqrt(2+0+0)*sqrt(2+0+0)) = 1
So from NR we are forced to conclude that tau is 1 in this case.
Sturla Molden
c = 2
d = 0
ex = 1
ey = 1
tau = 2/(sqrt(3)*sqrt(3)) = 2/3 = 0.666667
Which by the way is just what Minitab says.
Sturla Molden
> There are three pair combinations in these lists, namely: (A, B), (A, C)
> and (B, C). It is obvious that _one_ of these combinations has a tie for
> both lists (the (A,B) combination which is (1,1) for both R1 and R2).
> So, since there is one tie in both list we have T = U = 1
>
> We find that there are two concordant pairs in both lists (A, C) and
> (B,C) so P = 2. There are no discordant pairs, so Q = 0. With all
> variables given, we can now calculate Kendall's tau for R1 and R2:
>
> t = (2 - 0) / SQRT((2 + 0 + 1)*(2 + 0 + 1))
> t = 2 / SQRT(3*3)
> t = 2 / 3
> t = 0.6666666
>
> However, using scipy (svn HEAD) as follows:
>
> import scipy.stats.stats as s
> s.kendalltau([1,1,2], [1,1,2])
>
> Yields t = 1.0:
>
> (1.0, 0.11718509694604401)
>
> Which I believe is wrong (or at least: has no correction for ties, as is
> claimed in the source code). If there are three combinations and one of
> these is a tie, and the other two combinations are concordant, it makes
> sense that Kendall's tau-b should yield 2 / 3.
>
> The cause and fix
> -----------------
> Playing around with SciPy's code (and comparing it with my own) I believe
> I
> discovered a probable cause for this difference in SciPy's code. Again, I
> used the
> implementation at the following URL:
> http://svn.scipy.org/svn/scipy/trunk/scipy/stats/stats.py
> (please take look at the implementation first, otherwise you will not
> understand my explanation)
>
> In the 'kendalltau(x,y)' function we see a test for ties and an 'else'
> branch. In the 'else' branch the values of 'n1' and 'n2' are incremented
> if there is a tie (conforming to +T and +U in the formula given above).
> However, I believe that the 'if' conditions here are wrong:
> 1) Consider that if 'a1' has value '0' it is tied (the same goes for
> 'a2'). In the else branch I see:
>
> if a1:
> n1 = n1 + 1
> if a2:
> n2 = n2 + 1
>
> So, here the addition takes places on the variables (n1, n2) if there is
> NO tie, instead of if there is a tie. Hence, this explains the different
> outcome. Translating this back to the formula gives me T = U = 0, which
> would yield:
>
> t = (2 - 0) / SQRT((2 + 0 + 0)*(2 + 0 + 0))
> t = 2 / SQRT(2*2)
> t = 2 / 2
> t = 1.0
>
> Which is indeed consistent with the SciPy outcome. Henceforth, I believe
> the solution to this is to correct the condition in the if statements in
> the Kendall's tau function:
>
> if not a1:
> n1 = n1 + 1
> if not a2:
> n2 = n2 + 1
>
> Closing
> -------
> Of course, my interpretation of Kendall's Tau could be wrong. Since I
> can not exclude that possibility I would appreciate it if one of you could
> check and see if you reach the same conclusion. Maybe the base formula
> that
> SciPy uses is different.
>
> I have compared your implementation also to that implemented in the R
> project, however their source code suggests that they do not adjust for
> ties (effectively implementing Kendall's tau-a).
>
> --
> With kind regards,
>
> Almer S. Tigelaar
> University of Twente
>
> _______________________________________________
> Scipy-dev mailing list
> Scipy-dev@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
More information about the Scipy-dev
mailing list