[SciPy-User] Generalized least square on large dataset
Wed Mar 7 22:26:07 CST 2012
On Wed, Mar 7, 2012 at 11:04 PM, Charles R Harris
> On Wed, Mar 7, 2012 at 8:58 PM, <email@example.com> wrote:
>> On Wed, Mar 7, 2012 at 10:46 PM, Charles R Harris
>> <firstname.lastname@example.org> wrote:
>> > On Wed, Mar 7, 2012 at 7:39 PM, Peter Cimermančič
>> > <email@example.com> wrote:
>> >> Hi,
>> >> I'd like to linearly fit the data that were NOT sampled independently.
>> >> I
>> >> came across generalized least square method:
>> >> b=(X'*V^(-1)*X)^(-1)*X'*V^(-1)*Y
>> >> X and Y are coordinates of the data points, and V is a "variance
>> >> matrix".
>> >> The equation is Matlab format - I've tried solving problem there too,
>> >> bit
>> >> it didn't work - but eventually I'd like to be able to solve problems
>> >> like
>> >> that in python. The problem is that due to its size (1000 rows and
>> >> columns),
>> >> the V matrix becomes singular, thus un-invertable. Any suggestions for
>> >> how
>> >> to get around this problem? Maybe using a way of solving generalized
>> >> linear
>> >> regression problem other than GLS?
>> > Plain old least squares will probably do a decent job for the fit, where
>> > you
>> > will run into trouble is if you want to estimate the covariance.
>> side question:
>> Are heteroscedasticity and (auto)correlation robust standard errors
>> popular in any field outside of economics/econometrics, so called
>> sandwich estimators of covariance matrix?
>> (estimate with OLS ignoring non-independent and non-identical noise,
>> but correct the covariance matrix)
>> I recently expanded this in statsmodels, and would like to start soon
>> some advertising in favor of sandwiches.
> I'm not familiar with them, but I can't speak for many. Indeed, there seems
> to be the most rudimentary understanding of statistics in many fields,
> basically reducible to root sum of squares for the more sophisticated ;)
> But I think I was contemplating something similar to what you mention.
> Sounds interesting.
Basic idea in an example:
Suppose you have a large sample where the noise is very highly autocorrelated.
OLS assumes you have a lot of independent observation and the standard
errors will be small. The real standard errors are much larger because
observations close to each other are almost the same.
Robust standard errors correct for this without assuming much about
the actual correlations.
A (a bit cryptic) example
that turned into a discussion about how to write a blog ;)
> SciPy-User mailing list
More information about the SciPy-User