[SciPy-User] Return sigmas from curve_fit

Matt Newville newville@cars.uchicago....
Tue Oct 16 20:55:14 CDT 2012


Hi Gökhan,

On Tue, Oct 16, 2012 at 2:32 PM, Gökhan Sever <gokhansever@gmail.com> wrote:
> Thanks Travis.
>
> That doesn't indeed, I missed the part that part of the curve_fit return was
> variance of the estimate(s).
>
> I am comparing IDL's curvefit and Scipy's curve_fit, and got slightly
> different results for the same data using the same fit function. I guess
> IDL's result is slightly wrong when the default tol value is used (The
> default value is 1.0 x  10-3.) comparing to the SciPy's default ftol of
> 1.49012e-08.

IDL's curvefit procedure is an implementation of the
Levenberg-Marquardt algorithm in IDL, but not using or calling
MINPACK-1 of Garbow, Hillstrom, and More.  I believe curvefit.pro is
based on the implementation from Bevington's book, though it may have
evolved over the years from that.  Scipy's leastsq() (and so
curve_fit()) calls MINPACK-1, which is generally more stable and
robust.  Thus, it is perfectly reasonable for there to be small
differences in results even thought the two naively claim to be using
the same algorithm.  Certainly having tolerances as high as 1.e-3 can
also effect the results.

The IDL mpfit.pro
(http://cow.physics.wisc.edu/~craigm/idl/fitting.html) procedure is a
bit closer to scipy.optimize.leastsq(), being a translation of MINPACK
to IDL, and might be a better comparison.  The mpfit.pro adds a few
bells and whistles (parameter bounds) which MINPACK-1 does not have,
but ignoring this gives (in my experience) very similar results to
MINPACK-1.  In general,  scipy.optimize.leastsq() will be faster, and
also has the good fortune of not being IDL.

Daπid warned that the estimated uncertainties from the covariance
matrix (which is automatically returned from leastsq() and
curve_fit()) assumes that the errors are normally distributed, and
that this assumption is questionable.  This is equally true for all
the implementations in question, so I doubt it would explain any
differences you see.  I would also implore all to recognize that even
if the estimated uncertainties from scipy.optimize.leastsq() are
imperfect, they are far better than having none at all.  It is very
easy for the armchair analyst to claim that errors are not normally
distributed (especially when the problem at hand hasn't even been
identified!), and a bit harder for the practicing analyst to show that
the errors are significantly non-normal in practice.  Even when this
claim is borne out to be true, it does not necessarily imply that the
simple estimate of uncertainties is significantly wrong.  Rather, it
implies that 1 statistic (stddev) is not the whole story.

You may find lmfit-py (http://newville.github.com/lmfit-py/) useful.
This is built on top of scipy.optimize.leastsq(), and add the ability
to apply bounds, fix parameters, and place algebraic constraints
between parameters (IMHO in a manner easier and more robust than IDL's
mpfit.pro, and more python-ic).  It also provides functions to walk
through the parameter space to more explicitly determine confidence
intervals, and to test whether errors are non-normally distributed
(see http://newville.github.com/lmfit-py/confidence.html).  The
example there shows a case with clearly non-normal distribution of
uncertainties, and a very skewed parameter space.  The automatically
estimated uncertainties are off by 20% or less of the more carefully
(and slowly) found values.  I would say that's a pretty good
endorsement of the automatic estimate from the covariance matrix, but
it's good to be able to check this out yourself even if only to show
that the the automatic estimate is close enough and a more careful
analysis doesn't change your conclusions.

Hope that helps,

--Matt Newville


More information about the SciPy-User mailing list