[SciPy-User] linear algebra: quadratic forms without linalg.inv

Bruce Southey bsouthey@gmail....
Mon Nov 2 12:31:28 CST 2009


On 11/02/2009 11:40 AM, josef.pktd@gmail.com wrote:
> On Mon, Nov 2, 2009 at 11:26 AM, Souheil Inati<souheil.inati@nyu.edu>  wrote:
>    
>> I have a strong opinion about this, and I am almost certainly in the
>> minority, but my feeling is this: once you have ill-conditioning all
>> bets are off.
>>
>> Once the problem is ill-conditioned, then there are an infinite number
>> of solutions that match your data in a least-squares sense.  You are
>> then required to say something further about how you want to pick a
>> particular solution from among the infinite number of equivalent
>> solutions.
>>      
> I think, that's the point. However, the solution in economics is not to
> replace the decision about your solution by a numerical procedure
> that selects one for the researcher.
>
> In statsmodels, I looked at the estimation results using pinv, which is
> exactly svd plus throw away tiny singular values (np.linalg.pinv).
>    
Please do not confuse SVD with pinv as these are not the same functions.
pinv returns a Moore Penrose inverse:
http://en.wikipedia.org/wiki/Moore%E2%80%93Penrose_pseudoinverse

Thus pinv is implemented using SVD but that is not the only way to get a 
Moore Penrose inverse.


> The problem is that this provides a nice solution and doesn't ring
> an alarm bell, I want to have exceptions or infinite standard errors for
> the parameter estimates.
> Handling multicollinearity has to be an explicit task and a conscious choice
> by the researcher, e.g. I used Ridge Regression (Tychonov), Bayesian priors,
> reparameterization and variable selection in the past.
> The choice of multicollinearity correction has to be reported in the results.
> If pinv (or svd) is blindly used, because there is no warning, then we will see
> researchers presenting their "nice" parameter estimates, which completely
> hide the fact that the parameters are actually not identified.
>    
There is no people 'blindly' using these methods as these are the basics 
of linear algebra and really has nothing to do with multicollinearity. 
When you have an overdetermined system to solve then there are an 
infinite number of solutions and you can not use the inverse to solve 
the normal equations. The most common approach is to rely on a 
generalized inverse (http://en.wikipedia.org/wiki/Generalized_inverse - 
not a great reference) to solve it - of which the Moore Penrose inverse 
is one specific type. When these are used such as in analysis of 
variance, then the results are not wrong, not hidden and totally 
accepted by the scientific community. But it does rely on the user to 
know when things are not as expected (which is usually trivial because 
the degrees of freedom are not as expected).

Bruce



More information about the SciPy-User mailing list