[SciPy-user] optimize.leastsq

David Trethewey dlrt2@ast.cam.ac...
Wed Jan 21 02:55:38 CST 2009


Robert Kern wrote:
> On Tue, Jan 20, 2009 at 09:34,  <josef.pktd@gmail.com> wrote:
>   
>> On Tue, Jan 20, 2009 at 6:30 AM, David Trethewey <dlrt2@ast.cam.ac.uk> wrote:
>>     
>>> I'm using the following code to fit a gaussian to a histogram of some data.
>>>
>>> #fit gaussian
>>>    fitfunc = lambda p, x: (p[0]**2)*exp(-(x-p[1])**2/(2*p[2]**2))  #
>>> Target function
>>>    errfunc = lambda p, x, y: fitfunc(p,x) -y          # Distance to the
>>> target function
>>>    doublegauss = lambda q,x: (q[0]**2)*exp(-(x-q[1])**2/(2*q[2]**2)) +
>>> (q[3]**2)*exp(-(x-q[4])**2/(2*q[5]**2))
>>>    doublegausserr = lambda q,x,y: doublegauss(q,x) - y
>>>    # initial guess
>>>    p0 = [10.0,-2,0.5]
>>>    # find parameters of single gaussian
>>>    p1,success  = optimize.leastsq(errfunc, p0[:], args =
>>> (hista[1],hista[0]))
>>>    errors_sq = errfunc(p1,hista[1],hista[0])**2
>>>
>>>
>>> I have the error message
>>>
>>> Traceback (most recent call last):
>>>  File "M31FeHfit_totalw108.py", line 116, in <module>
>>>   p1,success  = optimize.leastsq(errfunc, p0[:], args =
>>> (hista[1],hista[0]))
>>>  File "C:\Python25\lib\site-packages\scipy\optimize\minpack.py", line
>>> 264, in leastsq
>>>   m = check_func(func,x0,args,n)[0]
>>>  File "C:\Python25\lib\site-packages\scipy\optimize\minpack.py", line
>>> 11, in check_func
>>>   res = atleast_1d(thefunc(*((x0[:numinputs],)+args)))
>>>  File "M31FeHfit_totalw108.py", line 110, in <lambda>
>>>   errfunc = lambda p, x, y: fitfunc(p,x) -y          # Distance to the
>>> target function
>>> ValueError: shape mismatch: objects cannot be broadcast to a single shape
>>>
>>> Anyone know why this happens? Curiously I have had it work before but
>>> not with my current versions of scipy and python etc.
>>>
>>> David
>>>       
>> Check the dimensions hista[1],hista[0]. I can run your part of the
>> code without problems.
>>
>> If you want to estimate the parameters of a (parametric) distribution,
>> then using maximum likelihood estimation would be more appropriate
>> than using least squares on the histogram.
>>     
>
> Right. You can't just take the value of the PDF and compare it to the
> (density-normalized) value of the histogram. You have to integrate the
> PDF over each bin and compare that value to the mass-normalized value
> of the histogram. Least-squares still isn't quite appropriate for this
> task, not least because the amount of weight that you should apply to
> each data point is non-uniform.
>
> If you are doing the histogramming yourself from the raw data, you
> might be better off doing a maximum likelihood fit on the raw data
> like the .fit() method of the rv_continuous distribution objects in
> scipy.tats.
>
> If the data you have is already pre-histogrammed or discretized,
> though, you need a different formulation of ML. For given parameters,
> integrate the PDF over the bins of your histogram. This will give you
> the probability of a single sample falling into each bin. If you have
> N samples from this distribution (N being the number of data points
> that went into the real histogram), this defines a multinomial
> distribution over the bins. You can evaluate the log-likelihood of
> getting your real histogram given those PDF parameters using the
> multinomial distribution.
>
> I've actually had a far bit of success with the latter when estimating
> Weibull distributions when the typical techniques failed to be robust.
>
>   
I am doing the histogramming from the raw data, so sounds like a maximum 
likelihood fit would be better. What I have is a series of velocity and 
Fe/H measurements for a series of stars in the Andromeda galaxy, and the 
idea is to find a gaussian and double gaussian fit, and have a look to 
see whether the double gaussian is significantly better, to detect 
whether there are two distinct populations within the stars.

David

David


More information about the SciPy-user mailing list