[SciPy-User] linear algebra: quadratic forms without linalg.inv
Mon Nov 2 11:31:53 CST 2009
On 11/02/2009 10:31 AM, Sturla Molden wrote:
> email@example.com skrev:
>> It really depends on the application. From the applications I know,
>> pca is used for dimension reduction, when there are way too many
>> regressors to avoid overfitting.
> Too many regressors gives you one or more tiny singular values in the
> covariance matrix (X'X), which you use in:
> betas = (X'X)**-1 * X' * y
> So the inverse of X'X is heavily influenced by one or more of these
> "singular values" that do not contribute significantly to X'X. That is
> obviously ridicilous, because we want the factors that determines X'X to
> determinate the inverse, (X'X)**-1, as well. I.e. we want the regressors
> (betas) we estimate to be determined by the same factors that determines
> So we proceed by doing SVD on X'X and throw the offenders out. And in
> statistics, that is called "PCA". And small singular values in X'X is
> known as "multicolinearity".
> When multicolinearity is present, numerical stability is the problem:
> 1 / s[i] becomes infinite for s[i] == 0, and thus s[i] dominates
> (X'X)**-1 completely. But with s[i] == 0, s[i] does not even contribute
> to X'X. So it makes sence to edit too small s[i] values out, so that
> only the values of s[i] important for X'X is used to compute (X'X)**-1
> and betas. And that is what PCA does. Statistics textbooks usually don't
> teach this. They just say "multicolinearity is bad".
> Yes PCA is used for "dimensionality reduction" and avoiding overfitting.
> But why is overfitting a problem anyway? And why does PCA help? This is
> actually all entagled. The main issue is alwys that 1/s[i] is big when
> s[i] is small. Overfitting gives you a lot of these big 1/s values. And
> now the betas you solved does not reflect the signal in X'X, so the
> model has no predictive power.
> SciPy-User mailing list
Well that is fine if you are doing feature extraction but not feature
selection. Most of statistical problems involve feature selection so
obviously it gets more space and time. Feature extraction has relatively
very limited use in statistics (usually when 'black boxes' are useful)
so it is usually taught as an advanced topic.
More information about the SciPy-User