[Numpy-discussion] polyfit on multiple data points
Mon Apr 13 21:06:45 CDT 2009
On Mon, Apr 13, 2009 at 5:59 PM, Mathew Yeates <email@example.com> wrote:
> I understand how to fit the points (x1,y1) (x2,y2),(x3,y3) with a line
> using polyfit. But, what if I want to perform this task on every row of
> an array?
> For instance
> and I want the results to be the coefficients [a,b,c] and [d,e,f] where
> [a,b,c] fits the points (x1,y1) (x2,y2),(x3,y3) and
> [d,e,f] fits the points (s1,r1) (s2,r2),(s3,r3)
> I realize I could use "apply_along_axis" but I'm afraid of the
> performance penalty. Is there a way to do this without resorting to a
> function call for each row?
Which order of polynomial are you fitting to each 3 points?
If you want to have independent polynomials, the only other way I can
see is building a block design matrix, essentially stacking all
independent regression into one big least squares problem. The
performance will depend on the number of points you have, but I doubt
it will be faster to build the design matrix and solve the large least
squares problem then the loop.
For a low order polynomial, it might be possible to get an explicit
solution, easy for linear, I never tried for ols with two regressors,
maybe sympy would help.
If you really only have 3 points, then with two regressors and a
constant, you would already have an exact solution with a second order
polynomial, 3 equations in 3 unknowns. In this case, you can write an
explicit solution where it might be possible to use only vectorized
If you need the loop and you have fixed axis, doing the loop yourself
should be faster then "apply_along_axis".
I haven't tried any of this myself, but that's where I would be
looking, if runtime is more expensive than developer time and you
don't need a solution for the general case with more points and higher
Rereading your email, I interpret you want a line, 1st order
polynomial, and the predicted values of the dependent variable. That's
just a linear regression, you can just use the formula for slope and
constant, see for example stats.linregress.
Something like the following: assuming x, and y is mean corrected
array with your 3 points in columns, and rows are different cases,
both are (n,3) arrays, then the slope coefficient should be
b = (x*y).sum(1) / (x,x).sum(1)
constant is something like: a = y.mean(axis=1) - b * x.mean(1)
predicted points ypred = a[:,np.newaxis] + x*a[:,np.newaxis]
(should be (n,3) array)
warning: not tested, written from memory
More information about the Numpy-discussion