[Numpy-discussion] lstsq functionality

Keith Goodman kwgoodman@gmail....
Tue Jul 20 21:12:07 CDT 2010


On Tue, Jul 20, 2010 at 6:35 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
> On Mon, Jul 19, 2010 at 10:08 PM, Charles R Harris
> <charlesr.harris@gmail.com> wrote:
>>
>>
>> On Mon, Jul 19, 2010 at 9:40 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
>>>
>>> On Mon, Jul 19, 2010 at 8:27 PM, Charles R Harris
>>> <charlesr.harris@gmail.com> wrote:
>>> >
>>> >
>>> > On Mon, Jul 19, 2010 at 9:02 PM, Keith Goodman <kwgoodman@gmail.com>
>>> > wrote:
>>> >>
>>> >> On Mon, Jul 19, 2010 at 6:53 PM, Joshua Holbrook
>>> >> <josh.holbrook@gmail.com> wrote:
>>> >> > On Mon, Jul 19, 2010 at 5:50 PM, Charles R Harris
>>> >> > <charlesr.harris@gmail.com> wrote:
>>> >> >> Hi All,
>>> >> >>
>>> >> >> I'm thinking about adding some functionality to lstsq because I find
>>> >> >> myself
>>> >> >> doing the same fixes over and over. List follows.
>>> >> >>
>>> >> >> Add weights so data points can be weighted.
>>> >> >> Use column scaling so condition numbers make more sense.
>>> >> >> Compute covariance approximation?
>>> >> >>
>>> >> >> Unfortunately, the last will require using svd since there no linear
>>> >> >> least
>>> >> >> squares routines in LAPACK that also compute the covariance, at
>>> >> >> least
>>> >> >> that
>>> >> >> google knows about.
>>> >> >>
>>> >> >> Thoughts?
>>> >> >
>>> >> > Maybe make 2 functions--one which implements 1 and 2, and another
>>> >> > which implements 3? I think weights is an excellent idea!
>>> >>
>>> >> I'd like a lstsq that did less, like not calculate the sum of squared
>>> >> residuals. That's useful in tight loops. So I also think having two
>>> >> lstsq makes sense. One as basic as possible; one with bells. How does
>>> >> scipy's lstsq fit into all this?
>>> >
>>> > I think the computation of the residues is cheap in lstsq. The
>>> > algolrithm
>>> > used starts by reducing the design matrix to bidiagonal from and reduces
>>> > the
>>> > rhs at the same time. In other words an mxn problem becomes a (n+1)xn
>>> > problem. That's why the summed square of residuals is available but not
>>> > the
>>> > individual residuals, after the reduction there is only one residual and
>>> > its
>>> > square it the residue.
>>>
>>> That does sound good. But it must take some time. There's indexing,
>>> array creation, if statement, summing:
>>>
>>>        if results['rank'] == n and m > n:
>>>            resids = sum((transpose(bstar)[n:,:])**2,
>>> axis=0).astype(result_t)
>>>
>>> Here are the timings after removing the sum of squared residuals:
>>>
>>> >> x = np.random.rand(1000,10)
>>> >> y = np.random.rand(1000)
>>> >> timeit np.linalg.lstsq(x, y)
>>> 1000 loops, best of 3: 369 us per loop
>>> >> timeit np.linalg.lstsq2(x, y)
>>> 1000 loops, best of 3: 344 us per loop
>>> >>
>>> >> x = np.random.rand(10,2)
>>> >> y = np.random.rand(10)
>>> >> timeit np.linalg.lstsq(x, y)
>>> 10000 loops, best of 3: 102 us per loop
>>> >> timeit np.linalg.lstsq2(x, y)
>>> 10000 loops, best of 3: 77 us per loop
>>> _
>>
>> Now that I've looked through the driver program I see that you are right.
>> Also, all the info needed for the covariance is almost available in the
>> LAPACK driver program. Hmm, it seems that maybe the best thing to do here is
>> to pull out the lapack_lite driver program, modify it, and make it a
>> standalone python module that links to either the lapack_lite programs or
>> the ATLAS versions.  But that is more work than just doing things in python
>> :-\ It does have the added advantage that all the work arrays can be handled
>> internally.
>
> While taking a quick look at lstsq to see what I could cut, this line
> caught my eye:
>
> bstar[:b.shape[0],:n_rhs] = b.copy()

Even if bstar contained a view of b, the next line in lstsq would take
care of it:

a, bstar = _fastCopyAndTranspose(t, a, bstar)

> I thought that array.__setitem__ never gives a view of the right hand
> side. If that's so then the copy is not needed. That would save some
> time since b can be large. All unit test pass when I remove the copy,
> but you know how that goes...
>
> I also noticed that "import math" is done inside the lstsq function.
> Why is that?
>


More information about the NumPy-Discussion mailing list