[SciPy-user] optimize.leastsq

Robert Kern robert.kern@gmail....
Tue Jan 20 15:54:54 CST 2009

On Tue, Jan 20, 2009 at 09:34,  <josef.pktd@gmail.com> wrote:
> On Tue, Jan 20, 2009 at 6:30 AM, David Trethewey <dlrt2@ast.cam.ac.uk> wrote:
>> I'm using the following code to fit a gaussian to a histogram of some data.
>> #fit gaussian
>>    fitfunc = lambda p, x: (p[0]**2)*exp(-(x-p[1])**2/(2*p[2]**2))  #
>> Target function
>>    errfunc = lambda p, x, y: fitfunc(p,x) -y          # Distance to the
>> target function
>>    doublegauss = lambda q,x: (q[0]**2)*exp(-(x-q[1])**2/(2*q[2]**2)) +
>> (q[3]**2)*exp(-(x-q[4])**2/(2*q[5]**2))
>>    doublegausserr = lambda q,x,y: doublegauss(q,x) - y
>>    # initial guess
>>    p0 = [10.0,-2,0.5]
>>    # find parameters of single gaussian
>>    p1,success  = optimize.leastsq(errfunc, p0[:], args =
>> (hista[1],hista[0]))
>>    errors_sq = errfunc(p1,hista[1],hista[0])**2
>> I have the error message
>> Traceback (most recent call last):
>>  File "M31FeHfit_totalw108.py", line 116, in <module>
>>   p1,success  = optimize.leastsq(errfunc, p0[:], args =
>> (hista[1],hista[0]))
>>  File "C:\Python25\lib\site-packages\scipy\optimize\minpack.py", line
>> 264, in leastsq
>>   m = check_func(func,x0,args,n)[0]
>>  File "C:\Python25\lib\site-packages\scipy\optimize\minpack.py", line
>> 11, in check_func
>>   res = atleast_1d(thefunc(*((x0[:numinputs],)+args)))
>>  File "M31FeHfit_totalw108.py", line 110, in <lambda>
>>   errfunc = lambda p, x, y: fitfunc(p,x) -y          # Distance to the
>> target function
>> ValueError: shape mismatch: objects cannot be broadcast to a single shape
>> Anyone know why this happens? Curiously I have had it work before but
>> not with my current versions of scipy and python etc.
>> David
> Check the dimensions hista[1],hista[0]. I can run your part of the
> code without problems.
> If you want to estimate the parameters of a (parametric) distribution,
> then using maximum likelihood estimation would be more appropriate
> than using least squares on the histogram.

Right. You can't just take the value of the PDF and compare it to the
(density-normalized) value of the histogram. You have to integrate the
PDF over each bin and compare that value to the mass-normalized value
of the histogram. Least-squares still isn't quite appropriate for this
task, not least because the amount of weight that you should apply to
each data point is non-uniform.

If you are doing the histogramming yourself from the raw data, you
might be better off doing a maximum likelihood fit on the raw data
like the .fit() method of the rv_continuous distribution objects in

If the data you have is already pre-histogrammed or discretized,
though, you need a different formulation of ML. For given parameters,
integrate the PDF over the bins of your histogram. This will give you
the probability of a single sample falling into each bin. If you have
N samples from this distribution (N being the number of data points
that went into the real histogram), this defines a multinomial
distribution over the bins. You can evaluate the log-likelihood of
getting your real histogram given those PDF parameters using the
multinomial distribution.

I've actually had a far bit of success with the latter when estimating
Weibull distributions when the typical techniques failed to be robust.

Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco

More information about the SciPy-user mailing list