[Numpy-discussion] Efficient orthogonalisation with scipy/numpy

Charles R Harris charlesr.harris@gmail....
Tue Jan 19 21:18:46 CST 2010


On Tue, Jan 19, 2010 at 8:13 PM, Charles R Harris <charlesr.harris@gmail.com
> wrote:

>
>
> On Tue, Jan 19, 2010 at 8:02 PM, <josef.pktd@gmail.com> wrote:
>
>> On Tue, Jan 19, 2010 at 9:47 PM, Charles R Harris
>> <charlesr.harris@gmail.com> wrote:
>> >
>> >
>> > On Tue, Jan 19, 2010 at 6:08 PM, <josef.pktd@gmail.com> wrote:
>> >>
>> >> On Tue, Jan 19, 2010 at 6:48 PM, Charles R Harris
>> >> <charlesr.harris@gmail.com> wrote:
>> >> >
>> >> >
>> >> > On Tue, Jan 19, 2010 at 2:34 PM, <josef.pktd@gmail.com> wrote:
>> >> >>
>> >> >> On Tue, Jan 19, 2010 at 4:29 PM, Charles R Harris
>> >> >> <charlesr.harris@gmail.com> wrote:
>> >> >> >
>> >> >> >
>> >> >> > On Tue, Jan 19, 2010 at 1:47 PM, Gael Varoquaux
>> >> >> > <gael.varoquaux@normalesup.org> wrote:
>> >> >> >>
>> >> >> >> On Tue, Jan 19, 2010 at 02:22:30PM -0600, Robert Kern wrote:
>> >> >> >> > > y = y - np.dot(confounds.T, linalg.lstsq(confounds.T, y)[0])
>> >> >> >>
>> >> >> >> > > with np = numpy and linalg = scipy.linalg where scipy calls
>> >> >> >> > > ATLAS.
>> >> >> >>
>> >> >> >> > For clarification, are you trying to find the components of the
>> y
>> >> >> >> > vectors that are perpendicular to the space spanned by the 10
>> >> >> >> > orthonormal vectors in confounds?
>> >> >> >>
>> >> >> >> Yes. Actually, what I am doing is calculating partial correlation
>> >> >> >> between
>> >> >> >> x and y conditionally to confounds, with the following code:
>> >> >> >>
>> >> >> >> def cond_partial_cor(y, x, confounds=[]):
>> >> >> >>    """ Returns the partial correlation of y and x, conditionning
>> on
>> >> >> >>        confounds.
>> >> >> >>    """
>> >> >> >>    # First orthogonalise y and x relative to confounds
>> >> >> >>    if len(confounds):
>> >> >> >>        y = y - np.dot(confounds.T, linalg.lstsq(confounds.T,
>> y)[0])
>> >> >> >>        x = x - np.dot(confounds.T, linalg.lstsq(confounds.T,
>> x)[0])
>> >> >> >>    return np.dot(x, y)/sqrt(np.dot(y, y)*np.dot(x, x))
>> >> >> >>
>> >> >> >> I am not sure that what I am doing is optimal.
>> >> >> >>
>> >> >> >> > > Most of the time is spent in linalg.lstsq. The length of the
>> >> >> >> > > vectors
>> >> >> >> > > is
>> >> >> >> > > 810, and there are about 10 confounds.
>> >> >> >>
>> >> >> >> > Exactly what are the shapes? y.shape = (810, N);
>> confounds.shape =
>> >> >> >> > (810,
>> >> >> >> > 10)?
>> >> >> >>
>> >> >> >> Sorry, I should have been more precise:
>> >> >> >>
>> >> >> >> y.shape = (810, )
>> >> >> >> confounds.shape = (10, 810)
>> >> >> >>
>> >> >> >
>> >> >> > Column stack the bunch so that the last column is y, then do a qr
>> >> >> > decomposition. The last column of q is the (normalized) orthogonal
>> >> >> > vector
>> >> >> > and its amplitude is the last (bottom right) component of r.
>> >> >>
>> >> >> do you have to do qr twice, once with x and once with y in the last
>> >> >> column or can this be combined?
>> >> >>
>> >> >> I was trying to do something similar for partial autocorrelation for
>> >> >> timeseries but didn't manage or try anything better than repeated
>> >> >> leastsq or a variant.
>> >> >>
>> >> >
>> >> > Depends on what you want to do. The QR decomposition is essentially
>> >> > Gram-Schmidt on the columns. So if you just want an orthonormal basis
>> >> > for
>> >> > the subspace spanned by a bunch of columns, the columns of Q are
>> they.
>> >> > To
>> >> > get the part of y orthogonal to that subspace you can do y - Q*Q.T*y,
>> >> > which
>> >> > is probably what you want if the x's are fixed and the y's vary. If
>> >> > there is
>> >> > just one y, then putting it as the last column lets the QR algorithm
>> do
>> >> > that
>> >> > last bit of projection.
>> >>
>> >> Gram-Schmidt (looking at it for the first time) looks a lot like
>> >> sequential least squares projection. So, I'm trying to figure out if I
>> >> can use the partial results up to a specific column as partial least
>> >> squares and then work my way to the end by including/looking at more
>> >> columns.
>> >>
>> >
>> > I don't the QR factorization would work for normal PLS. IIRC, one of the
>> > algorithms does a svd of the cross correlation matrix. The difference is
>> > that in some sense the svd picks out the best linear combination of
>> columns,
>> > while the qr factorization without column pivoting just takes them in
>> order.
>> > The QR factorization used to be the method of choice for least squares
>> > because it is straight forward to compute, no iterating needed as in
>> svd,
>> > but these days that advantage is pretty much gone. It is still a common
>> > first step in the svd, however. The matrix is factored to Q*R, then the
>> svd
>> > of R is computed.
>>
>> I (finally) figured out svd and eigenvalue decomposition for this purpose.
>>
>> But from your description of QR, I thought specifically of the case
>> where we have a "natural" ordering of the regressors, similar to the
>> polynomial case of you and Anne. In the timeseries case it would be by
>> increasing lags
>>
>> yt on y_{t-1}
>> yt on y_{t-1}, y_{t-2}
>> ...
>> ...
>> yt on y_{t-k} for k= 1,...,K
>>
>> or yt on xt and the lags of xt
>>
>> This is really sequential LS with a predefined sequence, not PLS or
>> PCA/PCR or similar orthogonalization by "importance".
>> The usual procedure for deciding on the appropriate number of lags
>> usually loops over OLS with increasing number of regressors.
>> >From the discussion, I thought there might be a way to "cheat" in this
>> using QR and Gram-Schmidt
>>
>>
> Ah, then I think your idea would work. The norms of the residuals at each
> step would be along the diagonal of the R matrix. They won't necessarily
> decrease monotonically, however.
>
>
Or if you are fitting a quantity with the lags, then Q.T*y gives the
component of each orthogonalized lag. The running sum of the squares of the
components should approach the variance of y, so I expect the ratio would
give the percentage of the variance accounted for by the lags up to that
point.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20100119/36664c0a/attachment-0001.html 


More information about the NumPy-Discussion mailing list