[Numpy-discussion] Identifying Colinear Columns of a Matrix
Fri Aug 26 12:34:33 CDT 2011
I actually use the VIF when the design matrix can be inverted.... I do it the quick and dirty way as opposed to the step regression:
1. Calc the correlation coefficient of the matrix (w/o the intercept)
2. Return the diagonal of the inversion of the correlation matrix in step 1.
Again, the problem lies in the multiple column relationship... I wouldn't be able to run sub regressions at all when the columns are perfectly collinear.
From: email@example.com [mailto:firstname.lastname@example.org] On Behalf Of Skipper Seabold
Sent: Friday, August 26, 2011 10:28 AM
To: Discussion of Numerical Python
Subject: Re: [Numpy-discussion] Identifying Colinear Columns of a Matrix
On Fri, Aug 26, 2011 at 1:10 PM, Mark Janikas <email@example.com> wrote:
> Hello All,
> I am trying to identify columns of a matrix that are perfectly collinear.
> It is not that difficult to identify when two columns are identical are have
> zero variance, but I do not know how to ID when the culprit is of a higher
> order. i.e. columns 1 + 2 + 3 = column 4. NUM.corrcoef(matrix.T) will
> return NaNs when the matrix is singular, and LA.cond(matrix.T) will provide
> a very large condition number.. But they do not tell me which columns are
> causing the problem. For example:
> zt = numpy. array([[ 1. , 1. , 1. , 1. , 1. ],
> [ 0.25, 0.1 , 0.2 , 0.25, 0.5 ],
> [ 0.75, 0.9 , 0.8 , 0.75, 0.5 ],
> [ 3. , 8. , 0. , 5. , 0. ]])
> How can I identify that columns 0,1,2 are the issue because: column 1 +
> column 2 = column 0?
> Any input would be greatly appreciated. Thanks much,
The way that I know to do this in a regression context for (near
perfect) multicollinearity is VIF. It's long been on my todo list for
Maybe there are other ways with decompositions. I'd be happy to hear about them.
Please post back if you write any code to do this.
NumPy-Discussion mailing list
More information about the NumPy-Discussion