[SciPy-User] scipy.optimize named argument inconsistency
Christoph Deil
deil.christoph@googlemail....
Mon Sep 5 18:03:02 CDT 2011
On Sep 5, 2011, at 8:56 PM, josef.pktd@gmail.com wrote:
> On Mon, Sep 5, 2011 at 2:17 PM, Christoph Deil
> <deil.christoph@googlemail.com> wrote:
>>
>> On Sep 5, 2011, at 4:27 PM, josef.pktd@gmail.com wrote:
>>
>> On Mon, Sep 5, 2011 at 10:55 AM, Christoph Deil
>> <deil.christoph@googlemail.com> wrote:
>>
>> How can I compute the covariance matrix for a likelihood fit using the
>>
>> routines in scipy.optimize?
>>
>> 1300 lines in scikits.statsmodels wrapping some scipy optimizers for
>> maximum likelihood estimation
>> e.g. LikelihoodModel
>> https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/base/model.py#L81
>> GenericLikelihoodModel
>> https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/base/model.py#L421
>> in the latter I use loglike and loglikeobs, depending on whether I
>> need or want the Jacobian or not.
>>
>> This is a very nice class!
>> But installing and learning statsmodels takes a while, and wanting parameter
>> errors for a likelihood fit
>> is a very common case for using an optimizer, so I think it would be nice to
>> include that functionality in scipy.optimize itself (see code below).
>>
>> As far as I see the only method that returns a covariance matrix (or results
>>
>> from with it is easy to compute the covariance matrix) are leastsq and
>>
>> curve_fit. The problem with these is that they take a residual vector chi
>>
>> and internally compute chi2 = (chi**2).sum() and minimize that. But for a
>>
>> likelihood fit the quantity to be minimized is logL.sum(), i.e. no squaring
>>
>> is required, so as far as I can see it is not possible to do a likelihood
>>
>> fit with leastsq and curve_fit.
>>
>> On the other hand as Matt pointed out most (all) other optimization methods
>>
>> (like e.g. fmin) don't do this squaring and summing internally, but the user
>>
>> function does it if one is doing a chi2 fit, so using these it is easy to do
>>
>> a likelihood fit. But it is not possible to get parameter errors because
>>
>> these other optimizers don't return a Hessian or covariance matrix.
>>
>> It would be nice if there were a method in scipy.optimize to compute the
>>
>> covariance matrix for all optimizers.
>>
>> fmin and the other optimizer don't have enough assumptions to figure
>> out whether the Hessian is the covariance matrix of the parameters.
>> This is only true if the objective function is the loglikelihood. But
>> there is no reason fmin and the others should assume this.
>>
>> We had a brief discussion a while ago on the mailinglist which led to
>> the notes in the leastsq docstring
>> http://docs.scipy.org/scipy/docs/scipy.optimize.minpack.leastsq/#leastsq
>> that cov_x also is not always the (unscaled) covariance matrix.
>>
>> If the objective function comes from a different estimation method for
>> example, then I think it's usually not the case that the (inverse)
>> Hessian is the covariance matrix.
>>
>> Actually all that is missing from scipy to be able to get parameter errors
>> in a likelihood fit (which includes chi2 fit and I believe is by far the
>> most common case) with scipy.optimize (for all optimizers!) is a method to
>> compute the Hessian, as already available e.g. in these packages:
>> http://code.google.com/p/numdifftools/source/browse/trunk/numdifftools/core.py#1134
>> https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/sandbox/regression/numdiff.py#L99
>> Would it be possible to move one of the approx_hess methods to
>> scipy.optimize (there is already a scipy.optimize.approx_fprime)?
>> Looking through numpy and scipy the only differentiation method I could find
>> was
>> http://docs.scipy.org/doc/scipy/reference/generated/scipy.misc.derivative.html
>> which is only 1D, I believe.
>> """This is the example from the curve_fit docstring"""
>> import numpy as np
>> from scipy.optimize import curve_fit, fmin
>> def func(x, a, b, c):
>> return a * np.exp(-b * x) + c
>> p0 = (2.5, 1.3, 0.5)
>> x = np.linspace(0, 4, 50)
>> y = func(x, *p0)
>> np.random.seed(0)
>> yn = y + 0.2 * np.random.normal(size=len(x))
>> popt, pcov = curve_fit(func, x, yn)
>> print 'curve_fit results:'
>> print 'values:', popt
>> print 'errors:', np.sqrt(pcov.diagonal())
>> """And here is how to compute the fit parameter values and errors
>> using one of the other optimizers (exemplified with fmin) and
>> a method to compute the Hesse matrix"""
>> def chi2(pars):
>> chi = yn - func(x, *pars)
>> return (chi ** 2).sum()
>> popt = fmin(chi2, p0, disp=False)
>> from numpy.dual import inv
>> from scikits.statsmodels.sandbox.regression.numdiff import approx_hess3 as
>> approx_hess
>> phess = approx_hess(popt, chi2)
>> def approx_covar(hess, red_chi2):
>> return red_chi2 * inv(phess / 2.)
>> pcov = approx_covar(popt, chi2(popt) / (len(x) - len(p0)))
>
> I don't quite see where the normalizations, e.g. /2., are coming from.
I'm not sure either why a factor /2. gives a correct covariance matrix here.
I found several references that the covariance matrix is simply the inverse Hessian, no factor 2 !?
> However, I never tried to use the Hessian with leastsq, and I would
> have to look at this.
>
>> print 'curve_fit results:'
>> print 'values:', popt
>> print 'errors:', np.sqrt(pcov.diagonal())
>> curve_fit results:
>> values: [ 2.80720814 1.24568448 0.44517316]
>> errors: [ 0.12563313 0.12409886 0.05875364]
>> fmin and approx_hess results:
>> values: [ 2.80720337 1.24565936 0.44515526]
>> errors: [ 0.12610377 0.12802944 0.05979982]
>
> The attachment is a quickly written script to use the
> GenericLikelihoodModel. The advantage of using statsmodels is that you
> are (supposed to be) getting all additional results that are available
> for Maximum Likelihood Estimation.
>
> e.g. my version of this with stats.norm.pdf. I took the pattern
> partially from the miscmodel that uses t distributed errors. It's
> supposed to be easy to switch distributions.
>
> -----------
> class MyNonLinearMLEModel(NonLinearMLEModel):
> '''Maximum Likelihood Estimation of Linear Model with nonlinear function
>
> This is an example for generic MLE.
>
> Except for defining the negative log-likelihood method, all
> methods and results are generic. Gradients and Hessian
> and all resulting statistics are based on numerical
> differentiation.
>
> '''
>
> def _predict(self, params, x):
> a, b, c = params
> return a * np.exp(-b * x) + c
>
>
> mod = MyNonLinearMLEModel(yn, x)
> res = mod.fit(start_params=[1., 1., 1., 1.])
> print 'true parameters'
> print np.array(list(p0)+[0.2])
> print 'parameter estimates'
> print res.params
> print 'standard errors, hessian, jacobian, sandwich'
> print res.bse
> print res.bsejac
> print res.bsejhj
> -------
> curve_fit results:
> values: [ 2.80720815 1.24568449 0.44517316]
> errors: [ 0.12563313 0.12409886 0.05875364]
> Optimization terminated successfully.
> Current function value: -7.401006
> Iterations: 141
> Function evaluations: 247
> true parameters
> [ 2.5 1.3 0.5 0.2]
> parameter estimates
> [ 2.80725714 1.24569287 0.44515168 0.20867979]
> standard errors, hessian, jacobian, sandwich
> [ 0.12226468 0.12414298 0.05798039 0.02086842]
> [ 0.15756494 0.15865264 0.06242619 0.02261329]
> [ 0.09852726 0.10175901 0.05740336 0.01994902]
There are large differences between the three methods!?
Especially the Jacobian method is too high in this case (and doesn't match the one from curve_fit, which also uses the Jacobian method, right?).
I did one more check with the hesse() method from MINUIT, which gives consistent (well, two significant digits) results with curve_fit in this case:
curve_fit results:
values: [ 2.80720814 1.24568448 0.44517316]
errors: [ 0.12563313 0.12409886 0.05875364]
minuit results
values: [ 2.80652024 1.24525732 0.44525123]
errors: [ 0.1260859 0.12784433 0.05975276]
from minuit import Minuit
def chi2(a, b, c):
chi = yn - func(x, a, b, c)
return (chi ** 2).sum()
m = Minuit(chi2, a=2.5, b=1.3, c=0.5)
m.migrad()
m.hesse()
pcov = red_chi2 * np.array(m.matrix())
popt = np.array(m.args)
print 'minuit results'
print 'values:', popt
print 'errors:', np.sqrt(pcov.diagonal())
>>>> res.aic
> -6.8020120409185836
>>>> res.bic
> 0.84607998079400026
>>>> res.t_test([0,1,0,0],1)
> Traceback (most recent call last):
> File "<pyshell#7>", line 1, in <module>
> res.t_test([0,1,0,0],1)
> File "E:\Josef\eclipsegworkspace\statsmodels-git\statsmodels-josef\scikits\statsmodels\base\model.py",
> line 1020, in t_test
> df_denom=self.model.df_resid)
> AttributeError: 'MyNonLinearMLEModel' object has no attribute 'df_resid'
>>>> res.model.df_resid = len(yn) - len(p0)
>>>> res.t_test([0,1,0,0],1)
> <scikits.statsmodels.stats.contrast.ContrastResults object at 0x04C60FB0>
>>>> print res.t_test([0,1,0,0],1)
> <T test: effect=array([ 1.24569287]), sd=array([[ 0.12414298]]),
> t=array([[ 1.97911215]]), p=array([[ 0.02683928]]), df_denom=47>
>>>>
>
> GenericLikelihoodModel is still not polished, df_resid is not defined
> generically. bootstrap raised an exception.
>
> I agree that scipy should have numerical differentiation, jacobian
> Hessian like numdifftools.
I've made an enhancement ticket and informed the numdifftools author about it:
http://projects.scipy.org/scipy/ticket/1510
>
> Josef
>
>>
>>
>> And it would be nice if the super-fast Levenberg-Marquard optimizer called
>>
>> by leastsq were also available for likelihood fits.
>>
>> Maybe it would be possible to add one method to scipy.optimize to compute
>>
>> the covariance matrix after any of the optimizers has run,
>>
>> i.e. the best-fit pars have been determined?
>>
>> cov = scipy.optimize.cov(func, pars)
>>
>> I think this can be done by computing the Hessian via finite differences and
>>
>> then inverting it.
>>
>> leastsq seems to compute the Hessian from a "Jacobian". This is the part I
>>
>> don't understand, but without going into the details, let me ask this:
>>
>> Is there something special about the Levenberg-Marquard optimizer that it
>>
>> requires the individual observations?
>>
>> It uses the outer (?) product of the Jacobian (all observations) to
>> find the improving directions, and the outerproduct of the Jacobian is
>> a numerical approximation to the Hessian in the
>> err = y-f(x,params) or loglikelihood case.
>>
>> The Hessian calculation can break down quite easily (not positive
>> definite because of numerical problems), the product of the Jacobian
>> is more robust in my experience and cheaper. (statsmodels
>> GenericLikelihood has covariance based on Hessian, Jacobian and a
>> sandwich of the two).
>>
>> Its true that there are often numerical problems with the Hessian.
>> If the Jacobian method is more robust, then maybe that could be included in
>> scipy.optimize instead of or in addition to the Hesse method?
>>
>> But I never looked at the internals of minpack.
>>
>> Josef
>>
>>
>>
>> Or is it just that the current implementation of _minpack._lmdif (which is
>>
>> called by leastsq) was written such that it works this way (i.e. includes a
>>
>> computation of cov_x in addition to x) and it could also be written to take
>>
>> a scalar func like all the other optimizers?
>>
>> Christoph
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User@scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
> <try_lonlin.py>_______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20110906/d0c11caa/attachment-0001.html
More information about the SciPy-User
mailing list