[SciPy-User] Revisit Unexpected covariance matrix from scipy.optimize.curve_fit
Fri Feb 22 12:44:21 CST 2013
On Fri, Feb 22, 2013 at 1:17 PM, <email@example.com> wrote:
> On Fri, Feb 22, 2013 at 1:03 PM, Pierre Barbier de Reuille
> <firstname.lastname@example.org> wrote:
>> I don't know about this result I must say, do you have a reference?
>> But intuitively, perr shouldn't change when applying the same weight to all
>> the values.
>> Barbier de Reuille Pierre
>> On 22 February 2013 17:12, Moore, Eric (NIH/NIDDK) [F] <email@example.com>
>>> > -----Original Message-----
>>> > From: Tom Aldcroft [mailto:firstname.lastname@example.org]
>>> > Sent: Friday, February 22, 2013 10:42 AM
>>> > To: SciPy Users List
>>> > Subject: [SciPy-User] Revisit Unexpected covariance matrix from
>>> > scipy.optimize.curve_fit
>>> > In Aug 2011 there was a thread [Unexpected covariance matrix from
>>> > scipy.optimize.curve_fit](http://mail.scipy.org/pipermail/scipy-
>>> > user/2011-August/030412.html)
>>> > where Christoph Deil reported that "scipy.optimize.curve_fit returns
>>> > parameter errors that don't scale with sigma, the standard deviation
>>> > of ydata, as I expected." Today I independently came to the same
>>> > conclusion.
>>> > This thread generated some discussion but seemingly no agreement that
>>> > the covariance output of `curve_fit` is not what would be expected. I
>>> > think the discussion wasn't as focused as possible because the example
>>> > was too complicated. With that I provide here about the simplest
>>> > possible example, which is fitting a constant to a constant dataset,
>>> > aka computing the mean and error on the mean. Since we know the
>>> > answers we can compare the output of `curve_fit`.
>>> > To illustrate things more easily I put the examples into an IPython
>>> > notebook which is available at:
>>> > http://nbviewer.ipython.org/5014170/
> If my fast reading is correct, then this is a very good example what I
> DON'T want in curve_fit.
> Your actual standard deviation (in simulation) is 1.
> Then you impose a sigma of 100, and your results are completely
> inconsistent with the data, huge error margins and confidence
> intervals 5 times the range of the actual observations.
> In most cases (maybe not in astronomy) we would like to estimate the
> parameter uncertainty based on the actual data.
It happens all the time in astronomy and physics (and probably any
experimental science) that you either have well-justified priors for
the errors, or need to account for systematic errors that are not
reflected in the scatter of the data. This is exactly why (at least
in physics/astronomy) one has the option to specify the actual
uncertainty of each data point for parameter estimation algorithms,
not just relative weighting.
> There are some cases where we have another estimate for sigma, a
> Bayesian can impose any prior; if we have more information about
> measurement errors, then we can use ODR, but in my opinion curve_fit
> should just be a boring standard weighted least squares.
But curve_fit is *much* simpler to use and I suspect most astronomers
will go straight there. The fundamental algorithm underneath
(Levenburg-Marquardt + chi^2 statistics) has no problem handling
absolute sigma values. I grew up on Numerical Recipes, and "mrqmin",
which gives as its output a covariance matrix which is exactly the
"expected" covariance matrix. So why limit the usefulness of
curve_fit to the case of empirical scatter estimates?
>>> > This was run using scipy 0.11.0 by the way. Any further discussion on
>>> > this topic to come to an understanding of the covariance output from
>>> > `curve_fit` would be appreciated.
>>> > Thanks,
>>> > Tom
>>> > _______________________________________________
>>> chi2 = np.sum(((yn-const(x, *popt))/sigma)**2)
>>> perr = np.sqrt(np.diag(pcov)/(chi2/(x.shape-1)))
>>> Perr is then the actual error in the fit parameter. No?
>>> SciPy-User mailing list
>> SciPy-User mailing list
> SciPy-User mailing list
More information about the SciPy-User