# [Numpy-discussion] Question on lstsq and correlation coeff

josef.pktd@gmai... josef.pktd@gmai...
Wed Feb 25 18:08:48 CST 2009

```On Wed, Feb 25, 2009 at 6:21 PM, Anthony Kong
<Anthony.Kong@macquarie.com> wrote:
> Hi, all,
>
> It is probably a newbie question.
>
> I trying to use scipy/numpy in a finanical context. I want to compute the
> correlation coeff of two series (returns vs index returns). I tried two
> appoarches
>
> Firstly,
>
> from scipy.linalg import lstsq
> coeffs,a,b,c = lstsq(matrix, returns) # matrix contains index returns
>
> then I tried,
>
> import numpy as np
> cov = np.cov(idx1, returns)
> print cov.tolist()
> stddev_x = np.std(returns, ddof=1)
> stddev_y = np.std(idx1, ddof=1)
> print "cor = %s" % (cov.tolist()[:-1] /(stddev_x * stddev_y))
> They differ from each other.
>
> As you can see from the numpy example, I am trying to find cor coeff for a
> sample. (ddof=1)
>
> So, my question is: is the discrepency caused by the fact that I am trying
> to use lstsq() on a 'sample popluation' (i.e. I am not regressing a full
> return series)? Is it correct to use lstsq() this way?
>

the most direct way to calculate the correlation matrix, use index
[0,1] to get coefficient.

numpy.corrcoef(x, y=None, rowvar=1, bias=0)

np.cov, that you used, uses biased estimator, denominator = N by
default, but for std you used N-1

Josef
```