[SciPy-User] Strange behaviour from corrcoef when calculating correlation-matrix in SciPy/NumPy.

josef.pktd@gmai... josef.pktd@gmai...
Wed Mar 2 13:06:23 CST 2011


On Wed, Mar 2, 2011 at 1:06 PM, Rajeev Raizada <rajeev.raizada@gmail.com> wrote:
> Dear SciPy users,
>
> I have a matrix (or, more strictly speaking, an array),
> and I want to calculate the correlation between each column
> and every other column, i.e. to make a standard correlation matrix.
>
> In Matlab, this is pretty straightforward,
> and the results also reflect the mathematical convention
> that corr(m) is just an abbreviated way of saying corr(m,m).
>
>>> m = [ 1 2; -1 3; 0 4]
> m =
>    1     2
>   -1     3
>    0     4
>
>>> corr(m)
> ans =
>   1.0000   -0.5000
>  -0.5000    1.0000
>
>>> corr(m,m)
> ans =
>   1.0000   -0.5000
>  -0.5000    1.0000
>
> However, the behaviour of SciPy/NumPy is quite different
> from what I had expected.
> In those modules, corrcoeff(m) is *not* the same as corrcoeff(m,m).
> Apparently, corrcoeff(x,y) produces the result corrcoef(vstack(x,y)),
> which strikes me as rather weird, and inconsistent
> with standard mathematical usage.

np.cov, np.corrcoef have a rowvar=0 option for the "standard" way if
variables are in columns, instead of transposing.

I also found it a bit strange that corrcoef(x,y) creates the stacked version.
scipy.stats.spearmanr inherits this behavior since I rewrote it.
scipy.stats.pearsonr hasn't been rewritten yet.

It didn't bug me enough, to figure out whether there is a reason for
this stacking behavior or not.

Josef


>
> Here are some examples, below.
>
> Raj
> ------------------
> In [1]: import scipy
>
> In [2]: m = scipy.array([[ 1, 2],[ -1, 3],[ 0, 4]])
>
> In [3]: m
> Out[3]:
> array([[ 1,  2],
>       [-1,  3],
>       [ 0,  4]])
>
> In [4]: m.T
> Out[4]:
> array([[ 1, -1,  0],
>       [ 2,  3,  4]])
>
> In [5]: scipy.corrcoef(m)
> Out[5]:
> array([[ 1.,  1.,  1.],
>       [ 1.,  1.,  1.],
>       [ 1.,  1.,  1.]])
>
> In [6]: scipy.corrcoef(m.T)
> Out[6]:
> array([[ 1. , -0.5],
>       [-0.5,  1. ]])
>
> # Note from Raj: that answer above, at least, matches what we'd want.
> # But it still gives a different result from corrcoef(m.T,m.T) !
>
> In [7]: scipy.corrcoef(m,m)
> Out[7]:
> array([[ 1.,  1.,  1.,  1.,  1.,  1.],
>       [ 1.,  1.,  1.,  1.,  1.,  1.],
>       [ 1.,  1.,  1.,  1.,  1.,  1.],
>       [ 1.,  1.,  1.,  1.,  1.,  1.],
>       [ 1.,  1.,  1.,  1.,  1.,  1.],
>       [ 1.,  1.,  1.,  1.,  1.,  1.]])
>
> In [8]: scipy.corrcoef(m.T,m.T)
> Out[8]:
> array([[ 1. , -0.5,  1. , -0.5],
>       [-0.5,  1. , -0.5,  1. ],
>       [ 1. , -0.5,  1. , -0.5],
>       [-0.5,  1. , -0.5,  1. ]])
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


More information about the SciPy-User mailing list