[Numpy-discussion] Multiple Regression

Robert Kern robert.kern@gmail....
Thu Nov 12 17:44:09 CST 2009


On Thu, Nov 12, 2009 at 17:38, Alexey Tigarev <alexey.tigarev@gmail.com> wrote:
> Hi All!
>
> I have implemented multiple regression in a following way:
>
> def multipleRegression(x, y):
>    """ Perform linear regression using least squares method.
>
>    X - matrix containing inputs for observations,
>    y - vector containing one of outputs for every observation """
>    mulregLogger.debug("multipleRegression(x=%s, y=%s)" % (x, y))
>    xt = transpose(x)
>    a = dot(xt, x)     # A = xt * x
>    b = dot(xt, y)     # B = xt * y
>    try:
>        return linalg.solve(a, b)

Never, ever use the normal equations. :-)

Use linalg.lstsq(x, y) instead.

>    except linalg.LinAlgError, lae:
>        mulregLogger.warn("Singular matrix:\n%s" % (a))
>        mulregLogger.warn(lae)
>        mulregLogger.warn("Determinant: %f" % (linalg.det(a)))
>        raise lae
>
> Can you suggest me something to optimize it?
>
> I am using it on large number of observations so it is common to have
> "x" matrix of about 5000x20 and "y" vector of length 5000, and more.
> I also have to run that multiple times for different "y" vectors and
> same "x" matrix.

Just make a matrix "y" such that each column vector is a different
output vector (e.g. y.shape == (5000, number_of_different_y_vectors))

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


More information about the NumPy-Discussion mailing list