[Numpy-discussion] polyfit in NumPy v1.7
Wed Feb 27 08:40:40 CST 2013
Thanks for the quick reply.
1. I see your point about zero weights but the code in its current form
doesn't take into account zero weights in counting the degrees of freedom,
as you point out, so it seems to me like a moot point. More importantly,
the documentation doesn't explain what the weights variable w is. In some
implementations of least square fitting it is 1/sigma^2 while in this one
it's 1/sigma. At the very least, the documentation should be more
explicit. How does one go about changing the documentation?
2. I am sorry but I don't understand your response. The matrix Vbase in
the code is already the covariance matrix, _before_ it is scaled by fac.
Scaling it by fac and returning Vbase*fac as the covariance matrix is
wrong, at least according to the references I know, including "Numerical
Recipes", by Press et al, "Data Reduction and Error Analysis for the
Physical Sciences" by Bevington, both standard works.
On Wed, Feb 27, 2013 at 9:03 AM, Charles R Harris <firstname.lastname@example.org
> On Wed, Feb 27, 2013 at 6:46 AM, David Pine <email@example.com> wrote:
>> As of NumPy v1.7, numpy.polyfit includes an option for providing
>> weighting to data to be fit. It's a welcome addition, but the
>> implementation seems a bit non-standard, perhaps even wrong, and I wonder
>> if someone can enlighten me.
>> 1. The documentation does not specify what the weighting array "w" is
>> supposed to be. After some fooling around I figured out that it is
>> 1/sigma, where sigma is the standard deviation uncertainty "sigma" in the y
>> data. A better implementation, which would be consistent with how
>> weighting is done in scipy.optimize.curve_fit, would be to specify "sigma"
>> directly, rather than "w". This is typically what the user has at hand,
>> this syntax is more transparent, and it would make polyfit consistent with
>> the nonlinear fitting routine scipy.optimize.curve_fit.
> Weights can be used for more things than sigma. Another common use is to
> set the weight to zero for bad data points. Zero is a nicer number than
> inf, although that would probably work for ieee numbers. Inf and 0/inf == 0
> are modern innovations, so multiplication weights are somewhat traditional.
> Weights of zero do need to be taken into account in counting the degrees of
> freedom however.
>> 2. The definition of the covariance matrix in polyfit is peculiar (or
>> maybe wrong, I'm not sure). Towards the end of the code for polyfit, the
>> standard covariance matrix is calculated and given the variable name
>> "Vbase". However, instead of returning Vbase as the covariance matrix,
>> polyfit returns the quantity Vbase * fac, where fac = resids / (len(x) -
>> order - 2.0). fac scales as N, the number of data points, so the
>> covariance matrix is about N times larger than it should be for large
>> values of N; the increase is smaller for small N (of order 10). Perhaps
>> the authors of polyfit want to use a more conservative error estimate, but
>> the references given in the poyfit code make no mention of it. I think it
>> would be much better to return the standard definition of the covariance
>> matrix (see, for example, the book Numerical Recipes).
> The resids scale as N, so fac is approximately constant. The effect of
> more data points shows up in (A^T * A)^-1, which is probably what is
> called covariance, but really isn't until it is scaled by fac.
> NumPy-Discussion mailing list
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion