# [SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau)

Sturla Molden sturla@molden...
Thu Mar 19 08:30:25 CDT 2009

```So, it seems this version is faster and more memory efficient:

def kendalltau_fromct(table):
'''return tau-a, tau-b and tau-c from contingency table data

example for contingency table:

count = np.array([[10,  5,  2],
[ 9, 12, 16]])

'''

def _table(j,i):
return {
#'above' :       table[:j,i],
'below' :       table[j+1:,i],
#'left' :        table[j,:i],
'right' :       table[j,i+1:],
#'upper-left' :  table[:j,:i],
#'upper-right' : table[:j,i+1:],
'lower-left' :  table[j+1:,:i],
'lower-right' : table[j+1:,i+1:] }

D = table.shape[0]
C = table.shape[1]
P = 0
Q = 0
Tx = 0
Ty = 0
for i in range(C):
for j in range(D):

pivot = table[j,i] # use this as pivot
ct = _table(j,i)   # split remainder into 8 sections

# count concordant pairs
# -- multiply pivot with 'lower-right' and summate
P += np.sum(ct['lower-right'] * pivot)

# count disconcordant pairs
# -- multiply pivot with 'lower-left' and summate

Q += np.sum(ct['lower-left'] * pivot)

# count pairs tied in y
# -- multiply pivot with 'right' and summate

Ty += np.sum(ct['right'] * pivot)

# count pairs tied in x
# -- multiply pivot with 'below' and summate

Tx += np.sum(ct['below'] * pivot)

n = np.sum(table)
tau_a = 2*float(P-Q)/(n*(n-1))
tau_b = float(P-Q)/(sqrt((P+Q+Tx)*(P+Q+Ty)))
m = C if C < D else D
tau_c = (P-Q)*(2*m/float((m-1)*n**2))
return tau_a, tau_b, tau_c

If Fortran 95 is difficult to build and maintain, we can do this in
Cython as well. There is no need for function calls with array arguments
here, so Fortan has no advantage over Cython in this case. Something
like this:

import numpy as np
fom math import sqrt
cimport numpy as np
ctypedef np.int_t int_t
cimport cython

@cython.boundscheck(False)
def kendalltau_fromct(np.ndarray[int_t, ndim=2] table):
'''return tau-a, tau-b and tau-c from contingency table data

example for contingency table:

count = np.array([[10,  5,  2],
[ 9, 12, 16]])

'''

cdef int C, D, P, Q, Tx, Ty
cdef int i, j, ii, jj, m, n

D = <int> table.shape[0]
C = <int> table.shape[1]
P = 0
Q = 0
Tx = 0
Ty = 0
n = 0
for j in range(D):
for i in range(C):

pivot = table[j,i] # use this as pivot
n = n + pivot

# count concordant pairs
# -- multiply pivot with 'lower-right' and summate
for jj in range(j+1,D):
for ii in range(i+1,C):
P = P + table[jj,ii] * pivot

# count disconcordant pairs
# -- multiply pivot with 'lower-left' and summate
for jj in range(j+1,D):
for ii in range(i):
Q = Q + table[jj,ii] * pivot

# count pairs tied in y
# -- multiply pivot with 'right' and summate
for ii in range(i+1,C):
Ty = Ty + table[j,ii] * pivot

# count pairs tied in x
# -- multiply pivot with 'below' and summate
for jj in range(j+1,D):
Tx = Tx + table[jj,i] * pivot

tau_a = 2*float(P-Q)/(n*(n-1))
tau_b = float(P-Q)/(sqrt(float((P+Q+Tx)*(P+Q+Ty))))
m = C if C < D else D
tau_c = (P-Q)*(2*m/float((m-1)*n**2))
return tau_a, tau_b, tau_c

Now I am done Kendall's tau for sure. Pick whichever you like.

Sturla Molden

```