[Numpy-discussion] polyfit in NumPy v1.7
Charles R Harris
Wed Feb 27 08:03:27 CST 2013
On Wed, Feb 27, 2013 at 6:46 AM, David Pine <email@example.com> wrote:
> As of NumPy v1.7, numpy.polyfit includes an option for providing weighting
> to data to be fit. It's a welcome addition, but the implementation seems a
> bit non-standard, perhaps even wrong, and I wonder if someone can enlighten
> 1. The documentation does not specify what the weighting array "w" is
> supposed to be. After some fooling around I figured out that it is
> 1/sigma, where sigma is the standard deviation uncertainty "sigma" in the y
> data. A better implementation, which would be consistent with how
> weighting is done in scipy.optimize.curve_fit, would be to specify "sigma"
> directly, rather than "w". This is typically what the user has at hand,
> this syntax is more transparent, and it would make polyfit consistent with
> the nonlinear fitting routine scipy.optimize.curve_fit.
Weights can be used for more things than sigma. Another common use is to
set the weight to zero for bad data points. Zero is a nicer number than
inf, although that would probably work for ieee numbers. Inf and 0/inf == 0
are modern innovations, so multiplication weights are somewhat traditional.
Weights of zero do need to be taken into account in counting the degrees of
> 2. The definition of the covariance matrix in polyfit is peculiar (or
> maybe wrong, I'm not sure). Towards the end of the code for polyfit, the
> standard covariance matrix is calculated and given the variable name
> "Vbase". However, instead of returning Vbase as the covariance matrix,
> polyfit returns the quantity Vbase * fac, where fac = resids / (len(x) -
> order - 2.0). fac scales as N, the number of data points, so the
> covariance matrix is about N times larger than it should be for large
> values of N; the increase is smaller for small N (of order 10). Perhaps
> the authors of polyfit want to use a more conservative error estimate, but
> the references given in the poyfit code make no mention of it. I think it
> would be much better to return the standard definition of the covariance
> matrix (see, for example, the book Numerical Recipes).
The resids scale as N, so fac is approximately constant. The effect of more
data points shows up in (A^T * A)^-1, which is probably what is called
covariance, but really isn't until it is scaled by fac.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion