[SciPy-dev] code for incremental least squares
Charles R Harris
Wed Feb 17 01:46:25 CST 2010
On Tue, Feb 16, 2010 at 2:56 PM, Nathaniel Smith <email@example.com> wrote:
> On Tue, Feb 16, 2010 at 12:37 PM, <firstname.lastname@example.org> wrote:
> > 2010/2/16 Stéfan van der Walt <email@example.com>:
> >> Hi Nathan
> >> On 16 February 2010 22:18, Nathaniel Smith <firstname.lastname@example.org> wrote:
> >>> I have a need for solving some very large least squares regression
> >>> problems -- but fortunately my model matrix can be generated
> >>> incrementally, so there are some tricks to do the important
> >>> calculations without actually loading the whole thing into memory. The
> >>> resulting code is rather general, and might be of wider interest, so I
> >>> thought I'd send a note here and see what people thought about
> >>> upstreaming it somehow...
> >> This sounds interesting! Could you expand on the incremental
> >> generation of the model matrix, and how it is made general?
> > I have the same thought, Are you increasing by observation or by
> > explanatory variable?
> By observation. (This is good since hopefully you have more
> observations than explanatory variables!) Actually generating the
> model matrix in an incremental way is your problem; but as you
> generate each 'strip' of the model matrix, you just hand it to the
> code's 'update' method and then forget about it.
> The basic idea inside 'update' is pretty elementary... if you have a
> model matrix
> X = np.row_stack([X1, X2, X3, ...])
> then what we need for least squares calculations is
> X'X = X1'X1 + X2'X2 + X3'X3 + ...
> and we can compute this sum incrementally as the X_i matrices arrive.
> Also, it is obviously easy to parallelize the matrix products (and
> potentially the generation of the strips themselves, depending on your
> situation), and those seem to be the bottleneck.
Did you look into Kalman filters? They probably aren't the most efficient
approach but they give you incremental solutions along the way if you want
them. There are also various factored versions available which avoid the
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the SciPy-Dev