# [SciPy-user] optimize.leastsq

David Trethewey dlrt2@ast.cam.ac...
Wed Jan 21 04:02:14 CST 2009

```So what I'm trying to work out now is how to use the .fit() method of
rv_continuous for a single gaussian and a double gaussian.

David

David Trethewey wrote:
> Robert Kern wrote:
>
>> On Tue, Jan 20, 2009 at 09:34,  <josef.pktd@gmail.com> wrote:
>>
>>
>>> On Tue, Jan 20, 2009 at 6:30 AM, David Trethewey <dlrt2@ast.cam.ac.uk> wrote:
>>>
>>>
>>>> I'm using the following code to fit a gaussian to a histogram of some data.
>>>>
>>>> #fit gaussian
>>>>    fitfunc = lambda p, x: (p[0]**2)*exp(-(x-p[1])**2/(2*p[2]**2))  #
>>>> Target function
>>>>    errfunc = lambda p, x, y: fitfunc(p,x) -y          # Distance to the
>>>> target function
>>>>    doublegauss = lambda q,x: (q[0]**2)*exp(-(x-q[1])**2/(2*q[2]**2)) +
>>>> (q[3]**2)*exp(-(x-q[4])**2/(2*q[5]**2))
>>>>    doublegausserr = lambda q,x,y: doublegauss(q,x) - y
>>>>    # initial guess
>>>>    p0 = [10.0,-2,0.5]
>>>>    # find parameters of single gaussian
>>>>    p1,success  = optimize.leastsq(errfunc, p0[:], args =
>>>> (hista[1],hista[0]))
>>>>    errors_sq = errfunc(p1,hista[1],hista[0])**2
>>>>
>>>>
>>>> I have the error message
>>>>
>>>> Traceback (most recent call last):
>>>>  File "M31FeHfit_totalw108.py", line 116, in <module>
>>>>   p1,success  = optimize.leastsq(errfunc, p0[:], args =
>>>> (hista[1],hista[0]))
>>>>  File "C:\Python25\lib\site-packages\scipy\optimize\minpack.py", line
>>>> 264, in leastsq
>>>>   m = check_func(func,x0,args,n)[0]
>>>>  File "C:\Python25\lib\site-packages\scipy\optimize\minpack.py", line
>>>> 11, in check_func
>>>>   res = atleast_1d(thefunc(*((x0[:numinputs],)+args)))
>>>>  File "M31FeHfit_totalw108.py", line 110, in <lambda>
>>>>   errfunc = lambda p, x, y: fitfunc(p,x) -y          # Distance to the
>>>> target function
>>>> ValueError: shape mismatch: objects cannot be broadcast to a single shape
>>>>
>>>> Anyone know why this happens? Curiously I have had it work before but
>>>> not with my current versions of scipy and python etc.
>>>>
>>>> David
>>>>
>>>>
>>> Check the dimensions hista[1],hista[0]. I can run your part of the
>>> code without problems.
>>>
>>> If you want to estimate the parameters of a (parametric) distribution,
>>> then using maximum likelihood estimation would be more appropriate
>>> than using least squares on the histogram.
>>>
>>>
>> Right. You can't just take the value of the PDF and compare it to the
>> (density-normalized) value of the histogram. You have to integrate the
>> PDF over each bin and compare that value to the mass-normalized value
>> of the histogram. Least-squares still isn't quite appropriate for this
>> task, not least because the amount of weight that you should apply to
>> each data point is non-uniform.
>>
>> If you are doing the histogramming yourself from the raw data, you
>> might be better off doing a maximum likelihood fit on the raw data
>> like the .fit() method of the rv_continuous distribution objects in
>> scipy.tats.
>>
>> If the data you have is already pre-histogrammed or discretized,
>> though, you need a different formulation of ML. For given parameters,
>> integrate the PDF over the bins of your histogram. This will give you
>> the probability of a single sample falling into each bin. If you have
>> N samples from this distribution (N being the number of data points
>> that went into the real histogram), this defines a multinomial
>> distribution over the bins. You can evaluate the log-likelihood of
>> getting your real histogram given those PDF parameters using the
>> multinomial distribution.
>>
>> I've actually had a far bit of success with the latter when estimating
>> Weibull distributions when the typical techniques failed to be robust.
>>
>>
>>
> I am doing the histogramming from the raw data, so sounds like a maximum
> likelihood fit would be better. What I have is a series of velocity and
> Fe/H measurements for a series of stars in the Andromeda galaxy, and the
> idea is to find a gaussian and double gaussian fit, and have a look to
> see whether the double gaussian is significantly better, to detect
> whether there are two distinct populations within the stars.
>
> David
>
> David
> _______________________________________________
> SciPy-user mailing list
> SciPy-user@scipy.org
> http://projects.scipy.org/mailman/listinfo/scipy-user
>

```