[SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau)
Almer S. Tigelaar
almer@gnome....
Wed Mar 18 08:19:38 CDT 2009
Hello,
On Wed, 2009-03-18 at 13:11 +0100, Sturla Molden wrote:
> So it seems Hollander & Wolfe and Minitab says 0.67, whereas Numerical
> Receipes says 1.0. Intuitively a vector correlation should be exactly
> correlated with itself, but I am inclined to trust Hollander & Wolfe
> more than Numerical Receipes.
Ah, I was under the impression you already checked Hollander & Wolfe.
Anyway, it seems my initial interpretation was right then. Repeating the
formula here (augmented) for future reference:
Kendall's tau-b (tie handling):
-------------------------------
Given two rankings R1 and R2, Kendall's tau-b is calculated by:
t = (P - Q) / SQRT((P + Q + T) * (P + Q + U))
where P is the number of concordant pairs, Q the number of discordant
pairs, T the number of ties in R1 and U the number of ties in R2.
[Ties are always counted regardless of whether they occur for the same
pair in R1 and R2 or different pairs]
-------------------------------
Some tests I ran today with the R implementation of Kendall's Tau(-a)
and the original implementation in SciPy.stats.stats (Kendall's Tau-b)
seem to suggests that if we do NOT count ties on the same pair (the
current situation in SciPy.stats.stats) effectively Kendall's Tau-b
gives the same outcomes as Kendall's Tau-a for about 36 test cases.
This seems to suggest that Kendall's Tau-b (tie correction) in SciPy as
it is behaves like Kendall's Tau-a (no tie correction), possibly because
of leaving out ties on identical pairs in T and U above.
I unfortunately do not have the time to mathematically prove (or
disprove) the equivalence of Kendall's Tau-a and the current SciPy
implementation right now, but I thought I'd be useful to mention these
test results.
--
With kind regards,
Almer S. Tigelaar
University of Twente
More information about the Scipy-dev
mailing list