# [Numpy-discussion] Efficient orthogonalisation with scipy/numpy

josef.pktd@gmai... josef.pktd@gmai...
Tue Jan 19 19:08:46 CST 2010

On Tue, Jan 19, 2010 at 6:48 PM, Charles R Harris
<charlesr.harris@gmail.com> wrote:
>
>
> On Tue, Jan 19, 2010 at 2:34 PM, <josef.pktd@gmail.com> wrote:
>>
>> On Tue, Jan 19, 2010 at 4:29 PM, Charles R Harris
>> <charlesr.harris@gmail.com> wrote:
>> >
>> >
>> > On Tue, Jan 19, 2010 at 1:47 PM, Gael Varoquaux
>> > <gael.varoquaux@normalesup.org> wrote:
>> >>
>> >> On Tue, Jan 19, 2010 at 02:22:30PM -0600, Robert Kern wrote:
>> >> > > y = y - np.dot(confounds.T, linalg.lstsq(confounds.T, y)[0])
>> >>
>> >> > > with np = numpy and linalg = scipy.linalg where scipy calls ATLAS.
>> >>
>> >> > For clarification, are you trying to find the components of the y
>> >> > vectors that are perpendicular to the space spanned by the 10
>> >> > orthonormal vectors in confounds?
>> >>
>> >> Yes. Actually, what I am doing is calculating partial correlation
>> >> between
>> >> x and y conditionally to confounds, with the following code:
>> >>
>> >> def cond_partial_cor(y, x, confounds=[]):
>> >>    """ Returns the partial correlation of y and x, conditionning on
>> >>        confounds.
>> >>    """
>> >>    # First orthogonalise y and x relative to confounds
>> >>    if len(confounds):
>> >>        y = y - np.dot(confounds.T, linalg.lstsq(confounds.T, y)[0])
>> >>        x = x - np.dot(confounds.T, linalg.lstsq(confounds.T, x)[0])
>> >>    return np.dot(x, y)/sqrt(np.dot(y, y)*np.dot(x, x))
>> >>
>> >> I am not sure that what I am doing is optimal.
>> >>
>> >> > > Most of the time is spent in linalg.lstsq. The length of the
>> >> > > vectors
>> >> > > is
>> >> > > 810, and there are about 10 confounds.
>> >>
>> >> > Exactly what are the shapes? y.shape = (810, N); confounds.shape =
>> >> > (810,
>> >> > 10)?
>> >>
>> >> Sorry, I should have been more precise:
>> >>
>> >> y.shape = (810, )
>> >> confounds.shape = (10, 810)
>> >>
>> >
>> > Column stack the bunch so that the last column is y, then do a qr
>> > decomposition. The last column of q is the (normalized) orthogonal
>> > vector
>> > and its amplitude is the last (bottom right) component of r.
>>
>> do you have to do qr twice, once with x and once with y in the last
>> column or can this be combined?
>>
>> I was trying to do something similar for partial autocorrelation for
>> timeseries but didn't manage or try anything better than repeated
>> leastsq or a variant.
>>
>
> Depends on what you want to do. The QR decomposition is essentially
> Gram-Schmidt on the columns. So if you just want an orthonormal basis for
> the subspace spanned by a bunch of columns, the columns of Q are they. To
> get the part of y orthogonal to that subspace you can do y - Q*Q.T*y, which
> is probably what you want if the x's are fixed and the y's vary. If there is
> just one y, then putting it as the last column lets the QR algorithm do that
> last bit of projection.

Gram-Schmidt (looking at it for the first time) looks a lot like
sequential least squares projection. So, I'm trying to figure out if I
can use the partial results up to a specific column as partial least
squares and then work my way to the end by including/looking at more
columns.

But unfortunately I don't have time to play with it long enough to
figure out whether and how it works, but I keep this in mind for the
future.

Thanks,

Josef

>
> Note that if you apply the QR algorithm to a Vandermonde matrix with the
> columns properly ordered you can get a collection of graded orthogonal
> polynomials over a given set of points.
>
> Chuck
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>