[SciPy-User] linear algebra: quadratic forms without linalg.inv
Bruce Southey
bsouthey@gmail....
Mon Nov 2 12:31:28 CST 2009
On 11/02/2009 11:40 AM, josef.pktd@gmail.com wrote:
> On Mon, Nov 2, 2009 at 11:26 AM, Souheil Inati<souheil.inati@nyu.edu> wrote:
>
>> I have a strong opinion about this, and I am almost certainly in the
>> minority, but my feeling is this: once you have ill-conditioning all
>> bets are off.
>>
>> Once the problem is ill-conditioned, then there are an infinite number
>> of solutions that match your data in a least-squares sense. You are
>> then required to say something further about how you want to pick a
>> particular solution from among the infinite number of equivalent
>> solutions.
>>
> I think, that's the point. However, the solution in economics is not to
> replace the decision about your solution by a numerical procedure
> that selects one for the researcher.
>
> In statsmodels, I looked at the estimation results using pinv, which is
> exactly svd plus throw away tiny singular values (np.linalg.pinv).
>
Please do not confuse SVD with pinv as these are not the same functions.
pinv returns a Moore Penrose inverse:
http://en.wikipedia.org/wiki/Moore%E2%80%93Penrose_pseudoinverse
Thus pinv is implemented using SVD but that is not the only way to get a
Moore Penrose inverse.
> The problem is that this provides a nice solution and doesn't ring
> an alarm bell, I want to have exceptions or infinite standard errors for
> the parameter estimates.
> Handling multicollinearity has to be an explicit task and a conscious choice
> by the researcher, e.g. I used Ridge Regression (Tychonov), Bayesian priors,
> reparameterization and variable selection in the past.
> The choice of multicollinearity correction has to be reported in the results.
> If pinv (or svd) is blindly used, because there is no warning, then we will see
> researchers presenting their "nice" parameter estimates, which completely
> hide the fact that the parameters are actually not identified.
>
There is no people 'blindly' using these methods as these are the basics
of linear algebra and really has nothing to do with multicollinearity.
When you have an overdetermined system to solve then there are an
infinite number of solutions and you can not use the inverse to solve
the normal equations. The most common approach is to rely on a
generalized inverse (http://en.wikipedia.org/wiki/Generalized_inverse -
not a great reference) to solve it - of which the Moore Penrose inverse
is one specific type. When these are used such as in analysis of
variance, then the results are not wrong, not hidden and totally
accepted by the scientific community. But it does rely on the user to
know when things are not as expected (which is usually trivial because
the degrees of freedom are not as expected).
Bruce
More information about the SciPy-User
mailing list