[SciPy-User] Generalized least square on large dataset
Charles R Harris
Wed Mar 7 22:00:24 CST 2012
On Wed, Mar 7, 2012 at 8:46 PM, Charles R Harris
> On Wed, Mar 7, 2012 at 7:39 PM, Peter Cimermančič <
> email@example.com> wrote:
>> I'd like to linearly fit the data that were NOT sampled independently. I
>> came across generalized least square method:
>> X and Y are coordinates of the data points, and V is a "variance matrix".
>> The equation is Matlab format - I've tried solving problem there too, bit
>> it didn't work - but eventually I'd like to be able to solve problems like
>> that in python. The problem is that due to its size (1000 rows and
>> columns), the V matrix becomes singular, thus un-invertable. Any
>> suggestions for how to get around this problem? Maybe using a way of
>> solving generalized linear regression problem other than GLS?
> Plain old least squares will probably do a decent job for the fit, where
> you will run into trouble is if you want to estimate the covariance. The
> idea of using the variance matrix is to transform the data set into
> independent observations of equal variance, but except in extreme cases
> that shouldn't really be necessary if you have sufficient data points.
> Weighting the data is a simple case of this that merely equalizes the
> variance, and it often doesn't make that much difference.
To expand a bit, if it is simply the case that the measurement errors
aren't independent and you know their covariance, then you want to minimize
(y - Ax)^T * cov^-1 * (y - ax) and if you factor cov^-1 into U^T * U, then
you can solve the ordinary least squares problem U*A*x = U*y. I can't
really tell what your data/problem is like without more details.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the SciPy-User