[SciPy-user] Python - Matlab least squares difference

Robert Kern robert.kern at gmail.com
Sat Jun 17 17:40:33 CDT 2006


A. Rachel Grinberg wrote:
> Robert,
> 
> You are absolutely right. If the system is underdetermined there are
> infinitely many solutions. Still, Matlab's solution yields residual = 0,
> and scipy gives you a VERY good approximation, though there are
> infinitely many "true"  solutions. I didn't get a chance to actually
> test out Matlab's accuracy. Meaning I don't really know that the example
> I had was a coincidence, of if Matlab always gives you solution with
> residual = 0 in case of an undwerdetermined system. If later is the
> case, than I would say that Matlab's algorithm is better.

Nonsense. The result you got was an accident of floating point arithmetic in
that all of the values there were integers and thus exactly representible in
floating point. Matlab's algorithm will not give you exact zeros for every
problem. And even if it did it still wouldn't be better.

If you were to do the problem in exact arithmetic, you will find a particular
line in RR^n along which each point is a solution to the LLS problem. For each
point along that line, there is a point in FF^n (the set of n-tuples of floating
point numbers, a strict subset of RR^n) that is closest to the point in RR^n.
Any of those FF^n points are perfectly valid answers when using floating point
arithmetic to solve the problem. If you take into account the accumulation of
floating point error, the set of acceptable points in FF^n is a bit wider,
forming a rough hypercylinder around the exact line.

However, there is the regularization convention. There is precisely one vector
in RR^n that is the correct answer under the "minimum L2-norm of x" convention.
The set of acceptable floating point answers is in a small hypersphere around
that point. Answers outside that hypersphere are not following the convention.
Just because the floating point error happens to be accidentally less in some
other region of the line does not mean that other region is a better answer. It
is not. It is an accident of floating point, not a consequence of the actual
problem you are trying to solve. The latter must always take precedence over the
former.

I recommend reading the first volume of _Numerical Computation_ by Christoph W.
Ueberhuber for a thorough discussion of the semantic issues behind numerical
modelling with floating point.

http://www.amazon.com/gp/product/3540620583/ref=pd_bxgy_img_b/002-3187502-6538449?%5Fencoding=UTF8

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco



More information about the SciPy-user mailing list