[Numpy-discussion] Identifying Colinear Columns of a Matrix

Skipper Seabold jsseabold@gmail....
Fri Aug 26 12:27:38 CDT 2011


On Fri, Aug 26, 2011 at 1:10 PM, Mark Janikas <mjanikas@esri.com> wrote:
> Hello All,
>
>
>
> I am trying to identify columns of a matrix that are perfectly collinear.
> It is not that difficult to identify when two columns are identical are have
> zero variance, but I do not know how to ID when the culprit is of a higher
> order. i.e. columns 1 + 2 + 3 = column 4.  NUM.corrcoef(matrix.T) will
> return NaNs when the matrix is singular, and LA.cond(matrix.T) will provide
> a very large condition number…. But they do not tell me which columns are
> causing the problem.   For example:
>
>
>
> zt = numpy. array([[ 1.  ,  1.  ,  1.  ,  1.  ,  1.  ],
>
>                            [ 0.25,  0.1 ,  0.2 ,  0.25,  0.5 ],
>
>                            [ 0.75,  0.9 ,  0.8 ,  0.75,  0.5 ],
>
>                            [ 3.  ,  8.  ,  0.  ,  5.  ,  0.  ]])
>
>
>
> How can I identify that columns 0,1,2 are the issue because: column 1 +
> column 2 = column 0?
>
>
>
> Any input would be greatly appreciated.  Thanks much,
>

The way that I know to do this in a regression context for (near
perfect) multicollinearity is VIF. It's long been on my todo list for
statsmodels.

http://en.wikipedia.org/wiki/Variance_inflation_factor

Maybe there are other ways with decompositions. I'd be happy to hear about them.

Please post back if you write any code to do this.

Skipper


More information about the NumPy-Discussion mailing list