[SciPy-User] scipy.optimize named argument inconsistency

Christoph Deil deil.christoph@googlemail....
Mon Sep 5 18:03:02 CDT 2011


On Sep 5, 2011, at 8:56 PM, josef.pktd@gmail.com wrote:

> On Mon, Sep 5, 2011 at 2:17 PM, Christoph Deil
> <deil.christoph@googlemail.com> wrote:
>> 
>> On Sep 5, 2011, at 4:27 PM, josef.pktd@gmail.com wrote:
>> 
>> On Mon, Sep 5, 2011 at 10:55 AM, Christoph Deil
>> <deil.christoph@googlemail.com> wrote:
>> 
>> How can I compute the covariance matrix for a likelihood fit using the
>> 
>> routines in scipy.optimize?
>> 
>> 1300 lines in scikits.statsmodels  wrapping some scipy optimizers for
>> maximum likelihood estimation
>> e.g. LikelihoodModel
>> https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/base/model.py#L81
>> GenericLikelihoodModel
>> https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/base/model.py#L421
>> in the latter I use loglike and loglikeobs, depending on whether I
>> need or want the Jacobian or not.
>> 
>> This is a very nice class!
>> But installing and learning statsmodels takes a while, and wanting parameter
>> errors for a likelihood fit
>> is a very common case for using an optimizer, so I think it would be nice to
>> include that functionality in scipy.optimize itself (see code below).
>> 
>> As far as I see the only method that returns a covariance matrix (or results
>> 
>> from with it is easy to compute the covariance matrix) are leastsq and
>> 
>> curve_fit. The problem with these is that they take a residual vector chi
>> 
>> and internally compute  chi2 = (chi**2).sum() and minimize that. But for a
>> 
>> likelihood fit the quantity to be minimized is logL.sum(), i.e. no squaring
>> 
>> is required, so as far as I can see it is not possible to do a likelihood
>> 
>> fit with leastsq and curve_fit.
>> 
>> On the other hand as Matt pointed out most (all) other optimization methods
>> 
>> (like e.g. fmin) don't do this squaring and summing internally, but the user
>> 
>> function does it if one is doing a chi2 fit, so using these it is easy to do
>> 
>> a likelihood fit. But it is not possible to get parameter errors because
>> 
>> these other optimizers don't return a Hessian or covariance matrix.
>> 
>> It would be nice if there were a method in scipy.optimize to compute the
>> 
>> covariance matrix for all optimizers.
>> 
>> fmin and the other optimizer don't have enough assumptions to figure
>> out whether the Hessian is the covariance matrix of the parameters.
>> This is only true if the objective function is the loglikelihood. But
>> there is no reason fmin and the others should assume this.
>> 
>> We had a brief discussion a while ago on the mailinglist which led to
>> the notes in the leastsq docstring
>> http://docs.scipy.org/scipy/docs/scipy.optimize.minpack.leastsq/#leastsq
>> that cov_x also is not always the (unscaled) covariance matrix.
>> 
>> If the objective function comes from a different estimation method for
>> example, then I think it's usually not the case that the (inverse)
>> Hessian is the covariance matrix.
>> 
>> Actually all that is missing from scipy to be able to get parameter errors
>> in a likelihood fit (which includes chi2 fit and I believe is by far the
>> most common case) with scipy.optimize (for all optimizers!) is a method to
>> compute the Hessian, as already available e.g. in these packages:
>> http://code.google.com/p/numdifftools/source/browse/trunk/numdifftools/core.py#1134
>> https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/sandbox/regression/numdiff.py#L99
>> Would it be possible to move one of the approx_hess methods to
>> scipy.optimize (there is already a scipy.optimize.approx_fprime)?
>> Looking through numpy and scipy the only differentiation method I could find
>> was
>> http://docs.scipy.org/doc/scipy/reference/generated/scipy.misc.derivative.html
>> which is only 1D, I believe.
>> """This is the example from the curve_fit docstring"""
>> import numpy as np
>> from scipy.optimize import curve_fit, fmin
>> def func(x, a, b, c):
>>     return a * np.exp(-b * x) + c
>> p0 = (2.5, 1.3, 0.5)
>> x = np.linspace(0, 4, 50)
>> y = func(x, *p0)
>> np.random.seed(0)
>> yn = y + 0.2 * np.random.normal(size=len(x))
>> popt, pcov = curve_fit(func, x, yn)
>> print 'curve_fit results:'
>> print 'values:', popt
>> print 'errors:', np.sqrt(pcov.diagonal())
>> """And here is how to compute the fit parameter values and errors
>> using one of the other optimizers (exemplified with fmin) and
>> a method to compute the Hesse matrix"""
>> def chi2(pars):
>>     chi = yn - func(x, *pars)
>>     return (chi ** 2).sum()
>> popt = fmin(chi2, p0, disp=False)
>> from numpy.dual import inv
>> from scikits.statsmodels.sandbox.regression.numdiff import approx_hess3 as
>> approx_hess
>> phess = approx_hess(popt, chi2)
>> def approx_covar(hess, red_chi2):
>>     return red_chi2 * inv(phess / 2.)
>> pcov = approx_covar(popt, chi2(popt) / (len(x) - len(p0)))
> 
> I don't quite see where the normalizations, e.g. /2., are coming from.


I'm not sure either why a factor /2. gives a correct covariance matrix here.
I found several references that the covariance matrix is simply the inverse Hessian, no factor 2 !?

> However, I never tried to use the Hessian with leastsq, and I would
> have to look at this.
> 
>> print 'curve_fit results:'
>> print 'values:', popt
>> print 'errors:', np.sqrt(pcov.diagonal())
>> curve_fit results:
>> values: [ 2.80720814  1.24568448  0.44517316]
>> errors: [ 0.12563313  0.12409886  0.05875364]
>> fmin and approx_hess results:
>> values: [ 2.80720337  1.24565936  0.44515526]
>> errors: [ 0.12610377  0.12802944  0.05979982]
> 
> The attachment is a quickly written script to use the
> GenericLikelihoodModel. The advantage of using statsmodels is that you
> are (supposed to be) getting all additional results that are available
> for Maximum Likelihood Estimation.
> 
> e.g. my version of this with stats.norm.pdf. I took the pattern
> partially from the miscmodel that uses t distributed errors. It's
> supposed to be easy to switch distributions.
> 
> -----------
> class MyNonLinearMLEModel(NonLinearMLEModel):
>    '''Maximum Likelihood Estimation of Linear Model with nonlinear function
> 
>    This is an example for generic MLE.
> 
>    Except for defining the negative log-likelihood method, all
>    methods and results are generic. Gradients and Hessian
>    and all resulting statistics are based on numerical
>    differentiation.
> 
>    '''
> 
>    def _predict(self, params, x):
>        a, b, c = params
>        return a * np.exp(-b * x) + c
> 
> 
> mod = MyNonLinearMLEModel(yn, x)
> res = mod.fit(start_params=[1., 1., 1., 1.])
> print 'true parameters'
> print np.array(list(p0)+[0.2])
> print 'parameter estimates'
> print res.params
> print 'standard errors, hessian, jacobian, sandwich'
> print res.bse
> print res.bsejac
> print res.bsejhj
> -------
> curve_fit results:
> values: [ 2.80720815  1.24568449  0.44517316]
> errors: [ 0.12563313  0.12409886  0.05875364]
> Optimization terminated successfully.
>         Current function value: -7.401006
>         Iterations: 141
>         Function evaluations: 247
> true parameters
> [ 2.5  1.3  0.5  0.2]
> parameter estimates
> [ 2.80725714  1.24569287  0.44515168  0.20867979]
> standard errors, hessian, jacobian, sandwich
> [ 0.12226468  0.12414298  0.05798039  0.02086842]
> [ 0.15756494  0.15865264  0.06242619  0.02261329]
> [ 0.09852726  0.10175901  0.05740336  0.01994902]

There are large differences between the three methods!?
Especially the Jacobian method is too high in this case (and doesn't match the one from curve_fit, which also uses the Jacobian method, right?).

I did one more check with the hesse() method from MINUIT, which gives consistent (well, two significant digits) results with curve_fit in this case:

curve_fit results:
values: [ 2.80720814  1.24568448  0.44517316]
errors: [ 0.12563313  0.12409886  0.05875364]
minuit results
values: [ 2.80652024  1.24525732  0.44525123]
errors: [ 0.1260859   0.12784433  0.05975276]

from minuit import Minuit
def chi2(a, b, c):
    chi = yn - func(x, a, b, c)
    return (chi ** 2).sum()
m = Minuit(chi2, a=2.5, b=1.3, c=0.5)
m.migrad()
m.hesse()
pcov = red_chi2 * np.array(m.matrix())
popt = np.array(m.args)
print 'minuit results'
print 'values:', popt
print 'errors:', np.sqrt(pcov.diagonal())


>>>> res.aic
> -6.8020120409185836
>>>> res.bic
> 0.84607998079400026
>>>> res.t_test([0,1,0,0],1)
> Traceback (most recent call last):
>  File "<pyshell#7>", line 1, in <module>
>    res.t_test([0,1,0,0],1)
>  File "E:\Josef\eclipsegworkspace\statsmodels-git\statsmodels-josef\scikits\statsmodels\base\model.py",
> line 1020, in t_test
>    df_denom=self.model.df_resid)
> AttributeError: 'MyNonLinearMLEModel' object has no attribute 'df_resid'
>>>> res.model.df_resid = len(yn) - len(p0)
>>>> res.t_test([0,1,0,0],1)
> <scikits.statsmodels.stats.contrast.ContrastResults object at 0x04C60FB0>
>>>> print res.t_test([0,1,0,0],1)
> <T test: effect=array([ 1.24569287]), sd=array([[ 0.12414298]]),
> t=array([[ 1.97911215]]), p=array([[ 0.02683928]]), df_denom=47>
>>>> 
> 
> GenericLikelihoodModel is still not polished, df_resid is not defined
> generically. bootstrap raised an exception.
> 
> I agree that scipy should have numerical differentiation, jacobian
> Hessian like numdifftools.

I've made an enhancement ticket and informed the numdifftools author about it:
http://projects.scipy.org/scipy/ticket/1510

> 
> Josef
> 
>> 
>> 
>> And it would be nice if the super-fast Levenberg-Marquard optimizer called
>> 
>> by leastsq were also available for likelihood fits.
>> 
>> Maybe it would be possible to add one method to scipy.optimize to compute
>> 
>> the covariance matrix after any of the optimizers has run,
>> 
>> i.e. the best-fit pars have been determined?
>> 
>> cov = scipy.optimize.cov(func, pars)
>> 
>> I think this can be done by computing the Hessian via finite differences and
>> 
>> then inverting it.
>> 
>> leastsq seems to compute the Hessian from a "Jacobian". This is the part I
>> 
>> don't understand, but without going into the details, let me ask this:
>> 
>> Is there something special about the Levenberg-Marquard optimizer that it
>> 
>> requires the individual observations?
>> 
>> It uses the outer (?) product of the Jacobian (all observations) to
>> find the improving directions, and the outerproduct of the Jacobian is
>> a numerical approximation to the Hessian in the
>> err = y-f(x,params) or loglikelihood case.
>> 
>> The Hessian calculation can break down quite easily (not positive
>> definite because of numerical problems), the product of the Jacobian
>> is more robust in my experience and cheaper. (statsmodels
>> GenericLikelihood has covariance based on Hessian, Jacobian and a
>> sandwich of the two).
>> 
>> Its true that there are often numerical problems with the Hessian.
>> If the Jacobian method is more robust, then maybe that could be included in
>> scipy.optimize instead of or in addition to the Hesse method?
>> 
>> But I never looked at the internals of minpack.
>> 
>> Josef
>> 
>> 
>> 
>> Or is it just that the current implementation of _minpack._lmdif (which is
>> 
>> called by leastsq) was written such that it works this way (i.e. includes a
>> 
>> computation of cov_x in addition to x) and it could also be written to take
>> 
>> a scalar func like all the other  optimizers?
>> 
>> Christoph
>> 
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User@scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>> 
>> 
> <try_lonlin.py>_______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20110906/d0c11caa/attachment-0001.html 


More information about the SciPy-User mailing list