[SciPy-User] linear algebra: quadratic forms without linalg.inv
josef.pktd@gmai...
josef.pktd@gmai...
Mon Nov 2 11:40:47 CST 2009
On Mon, Nov 2, 2009 at 11:26 AM, Souheil Inati <souheil.inati@nyu.edu> wrote:
>
> I have a strong opinion about this, and I am almost certainly in the
> minority, but my feeling is this: once you have ill-conditioning all
> bets are off.
>
> Once the problem is ill-conditioned, then there are an infinite number
> of solutions that match your data in a least-squares sense. You are
> then required to say something further about how you want to pick a
> particular solution from among the infinite number of equivalent
> solutions.
I think, that's the point. However, the solution in economics is not to
replace the decision about your solution by a numerical procedure
that selects one for the researcher.
In statsmodels, I looked at the estimation results using pinv, which is
exactly svd plus throw away tiny singular values (np.linalg.pinv).
The problem is that this provides a nice solution and doesn't ring
an alarm bell, I want to have exceptions or infinite standard errors for
the parameter estimates.
Handling multicollinearity has to be an explicit task and a conscious choice
by the researcher, e.g. I used Ridge Regression (Tychonov), Bayesian priors,
reparameterization and variable selection in the past.
The choice of multicollinearity correction has to be reported in the results.
If pinv (or svd) is blindly used, because there is no warning, then we will see
researchers presenting their "nice" parameter estimates, which completely
hide the fact that the parameters are actually not identified.
I think I worry more about numerical precision and efficiency when the
multicollinearity is not yet so extreme that we have to drop (near)zero
eigenvalues.
On Mon, Nov 2, 2009 at 11:31 AM, Sturla Molden <sturla@molden.no> wrote:
> josef.pktd@gmail.com skrev:
>> It really depends on the application. From the applications I know,
>> pca is used for dimension reduction, when there are way too many
>> regressors to avoid overfitting.
>
> Too many regressors gives you one or more tiny singular values in the
> covariance matrix (X'X), which you use in:
>
> betas = (X'X)**-1 * X' * y
>
> So the inverse of X'X is heavily influenced by one or more of these
> "singular values" that do not contribute significantly to X'X. That is
> obviously ridicilous, because we want the factors that determines X'X to
> determinate the inverse, (X'X)**-1, as well. I.e. we want the regressors
> (betas) we estimate to be determined by the same factors that determines
> X'X.
>
> So we proceed by doing SVD on X'X and throw the offenders out. And in
> statistics, that is called "PCA". And small singular values in X'X is
> known as "multicolinearity".
>
I think this applies to forecasting, but not when parameter estimates
and standard errors of the parameter estimates are the primary interest.
>
> When multicolinearity is present, numerical stability is the problem:
>
> 1 / s[i] becomes infinite for s[i] == 0, and thus s[i] dominates
> (X'X)**-1 completely. But with s[i] == 0, s[i] does not even contribute
> to X'X. So it makes sence to edit too small s[i] values out, so that
> only the values of s[i] important for X'X is used to compute (X'X)**-1
> and betas. And that is what PCA does. Statistics textbooks usually don't
> teach this. They just say "multicolinearity is bad".
>
> Yes PCA is used for "dimensionality reduction" and avoiding overfitting.
> But why is overfitting a problem anyway? And why does PCA help? This is
> actually all entagled. The main issue is alwys that 1/s[i] is big when
> s[i] is small. Overfitting gives you a lot of these big 1/s values. And
> now the betas you solved does not reflect the signal in X'X, so the
> model has no predictive power.
I'm not sure you need high multicollinearity to have overfitting.
Overfitting is still a problem after dropping the near zero singular
values, if many of the variables just capture variation in the past
data that doesn't really reflect the data generating process.
I think, cross validation and parameter selection usually select
fewer variables than would be required for positive definiteness.
Josef
>
>
> Sturla
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
More information about the SciPy-User
mailing list