# [SciPy-User] calculating covariances fast (and accurate)

Sturla Molden sturla@molden...
Thu Jul 8 18:30:15 CDT 2010

```I needed to calculate covariances with rounding error correction. As
SciPy expose BLAS, I could use dger and dgemm. Here is the code:

import numpy as np
import scipy as sp
import scipy.linalg
from scipy.linalg.fblas import dger, dgemm

def mean_cov(X):
n,p = X.shape
m = X.mean(axis=0)
# covariance matrix with correction for rounding error
# S = (cx'*cx - (scx'*scx/n))/(n-1)
# Am Stat 1983, vol 37: 242-247.
cx = X - m
scx = cx.sum(axis=0)
scx_op = dger(-1.0/n,scx,scx)
S = dgemm(1.0, cx.T, cx.T, beta=1.0,
c=scx_op, trans_a=0, trans_b=1, overwrite_c=1)
S[:] *= 1.0/(n-1)
return m,S.T

Let's time this couple of times against NumPy:

if __name__ == '__main__':

from time import clock

n,p = 20000,1000
X = 2*np.random.randn(n,p) + 5

t0 = clock()
m,S = X.mean(axis=0), np.cov(X, rowvar=False, bias=0)
t1 = clock()
print t1-t0

t0 = clock()
m,S = mean_cov(X)
t1 = clock()
print t1-t0

L:\>meancov.py
7.39771102515
2.24604790004

L:\>meancov.py
16.1079984658
2.21100101726

That speaks for itself :-D

One important lesson from this: it does not help to write a function
like cov(X) in C, when we have access to optimized BLAS from Python. So
let this serve as a warning against using C istead of Python.

If we don't want to correct rounding error, dger goes out and it just
becomes:

def mean_cov(X):
n,p = X.shape
m = X.mean(axis=0)
cx = X - m
S = dgemm(1./(n-1), cx.T, cx.T, trans_a=0, trans_b=1)
return m,S.T

Sturla

P.S. I should perhaps mention that I use SciPy and Numpy linked with
MKL. Looking at Windows' task manager, it seems both np.cov and my code
saturate all four cores.

```