[SciPy-User] max likelihood
eneide.odissea
eneide.odissea@gmail....
Tue Jun 22 02:46:28 CDT 2010
Hi All
I need to use max likelihood algorithm for fitting parameters for a
GARCH(1,1) model.
Is the Distribution to be assumed normal?
On Tue, Jun 22, 2010 at 3:43 AM, Skipper Seabold <jsseabold@gmail.com>wrote:
> On Mon, Jun 21, 2010 at 8:41 PM, David Goldsmith
> <d.l.goldsmith@gmail.com> wrote:
> > On Mon, Jun 21, 2010 at 5:19 PM, <josef.pktd@gmail.com> wrote:
> >>
> >> On Mon, Jun 21, 2010 at 8:03 PM, David Goldsmith
> >> <d.l.goldsmith@gmail.com> wrote:
> >> > On Mon, Jun 21, 2010 at 4:10 PM, <josef.pktd@gmail.com> wrote:
> >> >>
> >> >> On Mon, Jun 21, 2010 at 7:03 PM, David Goldsmith
> >> >> <d.l.goldsmith@gmail.com> wrote:
> >> >> > On Mon, Jun 21, 2010 at 3:17 PM, Skipper Seabold
> >> >> > <jsseabold@gmail.com>
> >> >> > wrote:
> >> >> >>
> >> >> >> On Mon, Jun 21, 2010 at 5:55 PM, David Goldsmith
> >> >> >> <d.l.goldsmith@gmail.com> wrote:
> >> >> >> > On Mon, Jun 21, 2010 at 2:43 PM, Skipper Seabold
> >> >> >> > <jsseabold@gmail.com>
> >> >> >> > wrote:
> >> >> >> >>
> >> >> >> >> On Mon, Jun 21, 2010 at 5:34 PM, David Goldsmith
> >> >> >> >> <d.l.goldsmith@gmail.com> wrote:
> >> >> >> >> > On Mon, Jun 21, 2010 at 2:17 PM, eneide.odissea
> >> >> >> >> > <eneide.odissea@gmail.com>
> >> >> >> >> > wrote:
> >> >> >> >> >>
> >> >> >> >> >> Hi All
> >> >> >> >> >> I had a look at the scipy.stats documentation and I was not
> >> >> >> >> >> able
> >> >> >> >> >> to
> >> >> >> >> >> find a
> >> >> >> >> >> function for
> >> >> >> >> >> maximum likelihood parameter estimation.
> >> >> >> >> >> Do you know whether is available in some other
> >> >> >> >> >> namespace/library
> >> >> >> >> >> of
> >> >> >> >> >> scipy?
> >> >> >> >> >> I found on the web few libraries ( this one is an
> >> >> >> >> >> example http://bmnh.org/~pf/p4.html<http://bmnh.org/%7Epf/p4.html> )
> having it,
> >> >> >> >> >> but I would prefer to start playing with what scipy already
> >> >> >> >> >> offers
> >> >> >> >> >> by
> >> >> >> >> >> default ( if any ).
> >> >> >> >> >> Kind Regards
> >> >> >> >> >> eo
> >> >> >> >> >
> >> >> >> >> > scipy.stats.distributions.rv_continuous.fit (I was just
> working
> >> >> >> >> > on
> >> >> >> >> > the
> >> >> >> >> > docstring for that; I don't believe my changes have been
> >> >> >> >> > merged; I
> >> >> >> >> > believe
> >> >> >> >> > Travis recently updated its code...)
> >> >> >> >> >
> >> >> >> >>
> >> >> >> >> This is for fitting the parameters of a distribution via
> maximum
> >> >> >> >> likelihood given that the DGP is the underlying distribution.
> I
> >> >> >> >> don't
> >> >> >> >> think it is intended for more complicated likelihood functions
> >> >> >> >> (where
> >> >> >> >> Nelder-Mead might fail). And in any event it will only find
> the
> >> >> >> >> parameters of the distribution rather than the parameters of
> some
> >> >> >> >> underlying model, if this is what you're after.
> >> >> >> >>
> >> >> >> >> Skipper
> >> >> >> >
> >> >> >> > OK, but just for clarity in my own mind: are you saying that
> >> >> >> > rv_continuous.fit is _definitely_ inappropriate/inadequate for
> >> >> >> > OP's
> >> >> >> > needs
> >> >> >> > (i.e., am I _completely_ misunderstanding the relationship
> between
> >> >> >> > the
> >> >> >> > function and OP's stated needs), or are you saying that the
> >> >> >> > function
> >> >> >> > _may_
> >> >> >> > not be general/robust enough for OP's stated needs?
> >> >> >>
> >> >> >> Well, I guess it depends on exactly what kind of likelihood
> function
> >> >> >> is being optimized. That's why I asked.
> >> >> >>
> >> >> >> My experience with stats.distributions is all of about a week, so
> I
> >> >> >> could be wrong. But here it goes... rv_continuous is not intended
> to
> >> >> >> be used on its own but rather as the base class for any
> >> >> >> distribution.
> >> >> >> So if you believe that your data came from say an Gaussian
> >> >> >> distribution, then you could use norm.fit(data) (with other
> options
> >> >> >> as
> >> >> >> needed) to get back estimates of scale and location. So
> >> >> >>
> >> >> >> In [31]: from scipy.stats import norm
> >> >> >>
> >> >> >> In [32]: import numpy as np
> >> >> >>
> >> >> >> In [33]: x = np.random.normal(loc=0,scale=1,size=1000)
> >> >> >>
> >> >> >> In [34]: norm.fit(x)
> >> >> >> Out[34]: (-0.043364692830314848, 1.0205901804210851)
> >> >> >>
> >> >> >> Which is close to our given location and scale.
> >> >> >>
> >> >> >> But if you had in mind some kind of data generating process for
> your
> >> >> >> model based on some other observed data and you were interested in
> >> >> >> the
> >> >> >> marginal effects of changes in the observed data on the outcome,
> >> >> >> then
> >> >> >> it would be cumbersome I think to use the fit in distributions. It
> >> >> >> may
> >> >> >> not be possible. Also, as mentioned, fit only uses Nelder-Mead
> >> >> >> (optimize.fmin with the default parameters, which I've found to be
> >> >> >> inadequate for even fairly basic likelihood based models), so it
> may
> >> >> >> not be robust enough. At the moment, I can't think of a way to
> fit
> >> >> >> a
> >> >> >> parameterized model as fit is written now. Come to think of it
> >> >> >> though
> >> >> >> I don't think it would be much work to extend the fit method to
> work
> >> >> >> for something like a linear regression model.
> >> >> >>
> >> >> >> Skipper
> >> >> >
> >> >> >
> >> >> > OK, this is all as I thought (e.g., fit only "works" to get the
> MLE's
> >> >> > from
> >> >> > data for a *presumed* distribution, but it is all-but-useless if
> the
> >> >> > distribution isn't (believed to be) "known" a priori); just wanted
> to
> >> >> > be
> >> >> > sure I was reading you correctly. :-) Thanks!
> >> >>
> >> >> MLE always assumes that the distribution is known, since you need the
> >> >> likelihood function.
> >> >
> >> > I'm not sure what I'm missing here (is it the definition of DGP? the
> >> > meaning
> >> > of Nelder-Mead? I want to learn, off-list if this is considered
> >> > "noise"):
> >> > according to my reference - Bain & Englehardt, Intro. to Prob. and
> Math.
> >> > Stat., 2nd Ed., Duxbury, 1992 - if the underlying population
> >> > distribution is
> >> > known, then the likelihood function is well-determined (although the
> >> > likelihood equation(s) it gives rise to may not be soluble
> analytically,
> >> > of
> >> > course). So why doesn't the OP knowing the underlying distribution
> (as
> >> > your
> >> > comment above implies they should if they seek MLEs) imply that s/he
> >> > would
> >> > also "know" what the likelihood function "looks like," (and thus the
> >> > question isn't so much what the likelihood function "looks like," but
> >> > what
> >> > the underlying distribution is, and thence, do we have that
> distribution
> >> > implemented yet in scipy.stats)?
> >>
> >> DGP: data generating process
> >>
> >> In many cases the assumed distribution of the error or noise variable
> >> is just the normal distribution. But what's the overall model that
> >> explains the endogenous variable.
> >> distribution.fit would just assume that each observations is a random
> >> draw from the same population distribution.
> >>
> >> But you can do MLE on standard linear regression, system of equations,
> >> ARIMA or GARCH in time series analysis. For any of this we need to
> >> specify what the relationship between the endogenous variable and it's
> >> own past and other explanatory variables is.
> >> e.g. simplest ARMA
> >>
> >> A(L) y_t = B(L) e_t
> >> with e_t independently and identically distributed (iid.) normal
> >> random variable
> >> A(L), B(L) lag-polynomials
> >> and for the full MLE we would also need to specify initial conditions.
> >>
> >> simple linear regression with non iid errors
> >> y_t = x_t * beta + e_t e = {e_t}_{for all t} distributed N(0,
> >> Sigma) plus assumptions on the structure of Sigma
> >>
> >> in these cases the likelihood function defines a lot more than just
> >> the distribution of the error term.
> >
> > Ah, jetzt ich verstehe (ich denke). So in the general case, the
> procedure
> > needs to "apportion" the information in the data among the parameters of
> the
> > "mechanistic" part of the model and the parameters of the "random noise"
> > part of the model, and the Maximum Likelihood Equations give you the
> values
> > of all these parameters (the mechanistic ones and noise ones) that
> maximize
> > the likelihood of observing the data one observed, correct?
> >
>
> Yes, I think you've got for the more general case that Josef describes.
>
> Skipper
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20100622/dddb55e5/attachment-0001.html
More information about the SciPy-User
mailing list