# [Numpy-discussion] Identifying Colinear Columns of a Matrix

Skipper Seabold jsseabold@gmail....
Fri Aug 26 12:27:38 CDT 2011

```On Fri, Aug 26, 2011 at 1:10 PM, Mark Janikas <mjanikas@esri.com> wrote:
> Hello All,
>
>
>
> I am trying to identify columns of a matrix that are perfectly collinear.
> It is not that difficult to identify when two columns are identical are have
> zero variance, but I do not know how to ID when the culprit is of a higher
> order. i.e. columns 1 + 2 + 3 = column 4.  NUM.corrcoef(matrix.T) will
> return NaNs when the matrix is singular, and LA.cond(matrix.T) will provide
> a very large condition number…. But they do not tell me which columns are
> causing the problem.   For example:
>
>
>
> zt = numpy. array([[ 1.  ,  1.  ,  1.  ,  1.  ,  1.  ],
>
>                            [ 0.25,  0.1 ,  0.2 ,  0.25,  0.5 ],
>
>                            [ 0.75,  0.9 ,  0.8 ,  0.75,  0.5 ],
>
>                            [ 3.  ,  8.  ,  0.  ,  5.  ,  0.  ]])
>
>
>
> How can I identify that columns 0,1,2 are the issue because: column 1 +
> column 2 = column 0?
>
>
>
> Any input would be greatly appreciated.  Thanks much,
>

The way that I know to do this in a regression context for (near
perfect) multicollinearity is VIF. It's long been on my todo list for
statsmodels.

http://en.wikipedia.org/wiki/Variance_inflation_factor

Maybe there are other ways with decompositions. I'd be happy to hear about them.

Please post back if you write any code to do this.

Skipper
```

More information about the NumPy-Discussion mailing list