[SciPy-User] linear algebra: quadratic forms without linalg.inv
Mon Nov 2 10:31:16 CST 2009
> It really depends on the application. From the applications I know,
> pca is used for dimension reduction, when there are way too many
> regressors to avoid overfitting.
Too many regressors gives you one or more tiny singular values in the
covariance matrix (X'X), which you use in:
betas = (X'X)**-1 * X' * y
So the inverse of X'X is heavily influenced by one or more of these
"singular values" that do not contribute significantly to X'X. That is
obviously ridicilous, because we want the factors that determines X'X to
determinate the inverse, (X'X)**-1, as well. I.e. we want the regressors
(betas) we estimate to be determined by the same factors that determines
So we proceed by doing SVD on X'X and throw the offenders out. And in
statistics, that is called "PCA". And small singular values in X'X is
known as "multicolinearity".
When multicolinearity is present, numerical stability is the problem:
1 / s[i] becomes infinite for s[i] == 0, and thus s[i] dominates
(X'X)**-1 completely. But with s[i] == 0, s[i] does not even contribute
to X'X. So it makes sence to edit too small s[i] values out, so that
only the values of s[i] important for X'X is used to compute (X'X)**-1
and betas. And that is what PCA does. Statistics textbooks usually don't
teach this. They just say "multicolinearity is bad".
Yes PCA is used for "dimensionality reduction" and avoiding overfitting.
But why is overfitting a problem anyway? And why does PCA help? This is
actually all entagled. The main issue is alwys that 1/s[i] is big when
s[i] is small. Overfitting gives you a lot of these big 1/s values. And
now the betas you solved does not reflect the signal in X'X, so the
model has no predictive power.
More information about the SciPy-User