[SciPy-User] linear algebra: quadratic forms without linalg.inv
Souheil Inati
souheil.inati@nyu....
Mon Nov 2 10:26:07 CST 2009
On Nov 2, 2009, at 10:38 AM, josef.pktd@gmail.com wrote:
> > snip
>
> It really depends on the application. From the applications I know,
> pca is used for dimension reduction, when there are way too many
> regressors to avoid overfitting. The most popular in econometrics
> might be
>
> Forecasting Using Principal Components from a Large Number of
> Predictors
> # James H. Stock and Mark W. Watson
> # Journal of the American Statistical Association, Vol. 97, No. 460
> (Dec., 2002), pp. 1167-1179
>
> A similar problem exists in chemometrics with more regressors than
> observations (at least from the descriptions I read when reading about
> NIPALS).
>
> I don't think that compared to the big stochastic errors, numerical
> precision plays much of a role. When we have large estimation errors
> in small samples in statistics, we don't have to worry, for example,
> about 10e-15 precision when our sampling errors are 10e-1.
>
> Of course, there are other applications, and I'm working my way slowly
> through the numerical issues.
>
> Josef
Hi Josef,
I have a strong opinion about this, and I am almost certainly in the
minority, but my feeling is this: once you have ill-conditioning all
bets are off.
Once the problem is ill-conditioned, then there are an infinite number
of solutions that match your data in a least-squares sense. You are
then required to say something further about how you want to pick a
particular solution from among the infinite number of equivalent
solutions. SVD/PCA is a procedure to find the minimum-two-norm
solution that fits the data. The minimum two-norm solution is
unique. For the general case, SVD is the only method that has a
proper theory. There is no proper theory for anything else, PERIOD.
The only other useful thing one can say is that if you expect your
solution to be sparse, then you can use the newly developed theory of
compressed sensing (tao and candes). This says that the minimum one-
norm solution is best in a statistical sense and provides an algorithm
to find it. The difference between SVD and compressed sensing is that
the former spreads the power out equally among the coefficients, while
the latter picks the solution that maximizes the magnitude of some of
the cofficients and sets others to zero (i.e. picks a sparse answer).
So if you're problem is ill-conditioned, then you are in trouble.
Your only legitimate options are to you use the SVD to pick the
minimum two-norm answer, or to use compressed sensing and pick the
minimum one-norm answer. Everything else is completely nonsense.
Cheers,
Souheil
More information about the SciPy-User
mailing list