# [SciPy-User] Strange behaviour from corrcoef when calculating correlation-matrix in SciPy/NumPy.

Wed Mar 2 12:06:52 CST 2011

```Dear SciPy users,

I have a matrix (or, more strictly speaking, an array),
and I want to calculate the correlation between each column
and every other column, i.e. to make a standard correlation matrix.

In Matlab, this is pretty straightforward,
and the results also reflect the mathematical convention
that corr(m) is just an abbreviated way of saying corr(m,m).

>> m = [ 1 2; -1 3; 0 4]
m =
1     2
-1     3
0     4

>> corr(m)
ans =
1.0000   -0.5000
-0.5000    1.0000

>> corr(m,m)
ans =
1.0000   -0.5000
-0.5000    1.0000

However, the behaviour of SciPy/NumPy is quite different
In those modules, corrcoeff(m) is *not* the same as corrcoeff(m,m).
Apparently, corrcoeff(x,y) produces the result corrcoef(vstack(x,y)),
which strikes me as rather weird, and inconsistent
with standard mathematical usage.

Here are some examples, below.

Raj
------------------
In [1]: import scipy

In [2]: m = scipy.array([[ 1, 2],[ -1, 3],[ 0, 4]])

In [3]: m
Out[3]:
array([[ 1,  2],
[-1,  3],
[ 0,  4]])

In [4]: m.T
Out[4]:
array([[ 1, -1,  0],
[ 2,  3,  4]])

In [5]: scipy.corrcoef(m)
Out[5]:
array([[ 1.,  1.,  1.],
[ 1.,  1.,  1.],
[ 1.,  1.,  1.]])

In [6]: scipy.corrcoef(m.T)
Out[6]:
array([[ 1. , -0.5],
[-0.5,  1. ]])

# Note from Raj: that answer above, at least, matches what we'd want.
# But it still gives a different result from corrcoef(m.T,m.T) !

In [7]: scipy.corrcoef(m,m)
Out[7]:
array([[ 1.,  1.,  1.,  1.,  1.,  1.],
[ 1.,  1.,  1.,  1.,  1.,  1.],
[ 1.,  1.,  1.,  1.,  1.,  1.],
[ 1.,  1.,  1.,  1.,  1.,  1.],
[ 1.,  1.,  1.,  1.,  1.,  1.],
[ 1.,  1.,  1.,  1.,  1.,  1.]])

In [8]: scipy.corrcoef(m.T,m.T)
Out[8]:
array([[ 1. , -0.5,  1. , -0.5],
[-0.5,  1. , -0.5,  1. ],
[ 1. , -0.5,  1. , -0.5],
[-0.5,  1. , -0.5,  1. ]])
```