[Numpy-discussion] polyfit in NumPy v1.7
Wed Feb 27 19:27:10 CST 2013
On Wed, Feb 27, 2013 at 3:01 PM, David Pine <email@example.com> wrote:
> Pauli, Josef, Chuck,
> I read over the discussion on curve_fit. I believe I now understand what
> people are trying to do when they write about scaling the weighting and/or
> covariance matrix. And I agree that what polyfit does in its current form
> is estimate the absolute errors in the data from the data and the fit.
> Unfortunately, in the case that you want to supply the absolute uncertainty
> estimates, polyfit doesn't leave the option of not making this "correction".
> This is a mistake,
> In my opinion, polyfit and curve_fit should be organized with the following
I'm fine with the structure (as long as estimating the error scale
remains the default).
I'm not a big fan of the names, but I'm not a developer or main user
of polyfit or curve_fit, so whatever...
> ydata_err : array_like or float or None
I still prefer weights or sigma because then I know what it means.
> absolute_err : bool (True or False)
> If ydata_err is an array, then it has the same length as xdata and ydata and
> gives the uncertainty (absolute or relative) of ydata.
uncertainty is vague, either standard deviation or "weights" multiplying y and x
> If ydata_err is a float (a single value), it gives the uncertainty (absolute
> or relative) of ydata. That is, all ydata points have the same uncertainty
fine, with the single value case, although it won't have any effect if
absolute_err is False
> If ydata_err is None, the ydata_err is set equal to 1 internally.
> If absolute_err is True, then no correction is made to the covariance matrix
> If absolute_err is False, the a correction is made to the covariance matrix
> based on the square of residuals and the value(s) of ydata_err so that it
> gives useful estimates of the uncertainties in the fitting parameters.
> Finally, I have a quibble with the use of the extra factor of 2 subtracted
> in the denominator of fac (line 596). This is highly non-standard. Most
> of the literature does not use this factor, and for good reason. It leads
> to complete nonsense when there are only 3 or 4 data points, for one thing.
> In supplying software for general use, the most common standard should be
I've never heard of the -2 correction
592 # Some literature ignores the extra -2.0 factor in the
593 # it is included here because the covariance of Multivariate Student-T
594 # (which is implied by a Bayesian uncertainty analysis) includes it.
595 # Plus, it gives a slightly more conservative estimate of uncertainty.
596 fac = resids / (len(x) - order - 2.0)
> Finally, the documentation needs to be improved so that it is clear. I
> would be happy to contribute both to improving the documentation and
> On Wed, Feb 27, 2013 at 12:47 PM, Pauli Virtanen <firstname.lastname@example.org> wrote:
>> 27.02.2013 16:40, David Pine kirjoitti:
>> > 2. I am sorry but I don't understand your response. The matrix Vbase
>> > in the code is already the covariance matrix, _before_ it is scaled by
>> > fac. Scaling it by fac and returning Vbase*fac as the covariance
>> > matrix is wrong, at least according to the references I know, including
>> > "Numerical Recipes", by Press et al, "Data Reduction and Error Analysis
>> > for the Physical Sciences" by Bevington, both standard works.
>> The covariance matrix is (A^T A)^-1 only if the data is weighed by its
>> standard errors prior to lstsq. Polyfit estimates the standard errors
>> from the fit itself, which results in the `fac` multiplication.
>> This is apparently what some people expect. The way the weight
>> parameters work is however confusing, as they are
>> w[i]=sigma_estimate/sigma[i], rather than being absolute errors.
>> Anyway, as Josef noted, it's the same problem that curve_fit in Scipy
>> had and probably the same fix needs to be done here.
>> Pauli Virtanen
>> NumPy-Discussion mailing list
> NumPy-Discussion mailing list
More information about the NumPy-Discussion