[Numpy-discussion] finding eigenvectors etc
Thu Feb 21 09:47:13 CST 2008
>> How are you using the values? How significant are the differences?
> i am using these eigenvectors to do PCA on a set of images(of faces).I
> sort the eigenvectors in descending order of their eigenvalues and
> this is multiplied with the (orig data of some images viz a matrix)to
> obtain a facespace.
I've dealt with similar issues a lot -- performing PCA on data where
the dimensionality of the data is much greater than the number of
data points. (Like images.)
In this case, the maximum number of non-trivial eigenvectors of the
covariance matrix of the data is min(dimension_of_data,
number_of_data_points), so one always runs into the zero-eigenvalue
problem; the matrix is thus always ill-conditioned, but that's not a
problem in these cases.
Nevertheless, if you've got (say) 100 images that are each 100x100
pixels, to do PCA in the naive way you need to make a 10000x10000
covariance matrix and then decompose it into 100000 eigenvectors and
values just to get out the 100 non-trivial ones. That's a lot of
computation wasted calculating noise! Fortunately, there are better
ways. One is to perform the SVD on the 100x10000 data matrix. Let the
centered (mean-subtracted) data matrix be D, then the SVD provides
matrices U, S, and V'. IIRC, the eigenvalues of D'D (the covariance
matrix of interest) are then packed along the first dimension of V',
and the eigenvalues are the square of the values in S.
But! There's an even faster way (from my testing). The trick is that
instead of calculating the 10000x10000 outer covariance matrix D'D,
or doing the SVD on D, one can calculate the 100x100 "inner"
covariance matrix DD' and perform the eigen-decomposition thereon and
then trivially transform those eigenvalues and vectors to the ones of
the D'D matrix. This computation is often substantially faster than
Here's how it works:
Let D, our re-centered data matrix, be of shape (n, k) -- that is, n
data points in k dimensions.
We know that D has a singular value decomposition D = USV' (no need
to calculate the SVD though; just enough to know it exists).
From this, we can rewrite the covariance matrices:
D'D = VS'SV'
DD' = USS'U'
Now, from the SVD, we know that S'S and SS' are diagonal matrices,
and V and U (and V' and U') form orthogonal bases. One way to write
the eigen-decomposition of a matrix is A = QLQ', where Q is
orthogonal and L is diagonal. Since the eigen-decomposition is unique
(up to a permutation of the columns of Q and L), we know that V must
therefore contain the eigenvectors of D'D in its columns, and U must
contain the eigenvectors of DD' in its columns. This is the origin of
the SVD recipe for that I gave above.
Further, let S_hat, of shape (n, k) be the elementwise reciprocal of
S (i.e. SS_hat = I of shape (m, n) and S_hatS = I of shape (n, m),
where I is the identity matrix).
Then, we can solve for U or V in terms of the other:
V = D'US_hat'
U = DVS_hat
So, to get the eigenvectors and eigenvalues of D'D, we just calculate
DD' and then apply the symmetric eigen-decomposition (symmetric
version is faster, and DD' is symmetric) to get eigenvectors U and
eigenvalues L. We know that L=SS', so S_hat = 1/sqrt(L) (where the
sqrt is taken elementwise, of course). So, the eigenvectors we're
looking for are:
V = D'US_hat
Then, the principal components (eigenvectors) are in the columns of V
(packed along the second dimension of V).
Fortunately, I've packaged this all up into a python module for PCA
that takes care of this all. It's attached.
Postdoctoral Fellow, Lab of Dr. Frank Slack
Molecular, Cellular and Developmental Biology
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 4350 bytes
Desc: not available
Url : http://projects.scipy.org/pipermail/numpy-discussion/attachments/20080221/364296dc/attachment.bin
More information about the Numpy-discussion