[SciPy-user] optimize.leastsq

Robert Kern robert.kern@gmail....
Tue Jan 20 15:54:54 CST 2009


On Tue, Jan 20, 2009 at 09:34,  <josef.pktd@gmail.com> wrote:
> On Tue, Jan 20, 2009 at 6:30 AM, David Trethewey <dlrt2@ast.cam.ac.uk> wrote:
>> I'm using the following code to fit a gaussian to a histogram of some data.
>>
>> #fit gaussian
>>    fitfunc = lambda p, x: (p[0]**2)*exp(-(x-p[1])**2/(2*p[2]**2))  #
>> Target function
>>    errfunc = lambda p, x, y: fitfunc(p,x) -y          # Distance to the
>> target function
>>    doublegauss = lambda q,x: (q[0]**2)*exp(-(x-q[1])**2/(2*q[2]**2)) +
>> (q[3]**2)*exp(-(x-q[4])**2/(2*q[5]**2))
>>    doublegausserr = lambda q,x,y: doublegauss(q,x) - y
>>    # initial guess
>>    p0 = [10.0,-2,0.5]
>>    # find parameters of single gaussian
>>    p1,success  = optimize.leastsq(errfunc, p0[:], args =
>> (hista[1],hista[0]))
>>    errors_sq = errfunc(p1,hista[1],hista[0])**2
>>
>>
>> I have the error message
>>
>> Traceback (most recent call last):
>>  File "M31FeHfit_totalw108.py", line 116, in <module>
>>   p1,success  = optimize.leastsq(errfunc, p0[:], args =
>> (hista[1],hista[0]))
>>  File "C:\Python25\lib\site-packages\scipy\optimize\minpack.py", line
>> 264, in leastsq
>>   m = check_func(func,x0,args,n)[0]
>>  File "C:\Python25\lib\site-packages\scipy\optimize\minpack.py", line
>> 11, in check_func
>>   res = atleast_1d(thefunc(*((x0[:numinputs],)+args)))
>>  File "M31FeHfit_totalw108.py", line 110, in <lambda>
>>   errfunc = lambda p, x, y: fitfunc(p,x) -y          # Distance to the
>> target function
>> ValueError: shape mismatch: objects cannot be broadcast to a single shape
>>
>> Anyone know why this happens? Curiously I have had it work before but
>> not with my current versions of scipy and python etc.
>>
>> David
>
> Check the dimensions hista[1],hista[0]. I can run your part of the
> code without problems.
>
> If you want to estimate the parameters of a (parametric) distribution,
> then using maximum likelihood estimation would be more appropriate
> than using least squares on the histogram.

Right. You can't just take the value of the PDF and compare it to the
(density-normalized) value of the histogram. You have to integrate the
PDF over each bin and compare that value to the mass-normalized value
of the histogram. Least-squares still isn't quite appropriate for this
task, not least because the amount of weight that you should apply to
each data point is non-uniform.

If you are doing the histogramming yourself from the raw data, you
might be better off doing a maximum likelihood fit on the raw data
like the .fit() method of the rv_continuous distribution objects in
scipy.tats.

If the data you have is already pre-histogrammed or discretized,
though, you need a different formulation of ML. For given parameters,
integrate the PDF over the bins of your histogram. This will give you
the probability of a single sample falling into each bin. If you have
N samples from this distribution (N being the number of data points
that went into the real histogram), this defines a multinomial
distribution over the bins. You can evaluate the log-likelihood of
getting your real histogram given those PDF parameters using the
multinomial distribution.

I've actually had a far bit of success with the latter when estimating
Weibull distributions when the typical techniques failed to be robust.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


More information about the SciPy-user mailing list