[SciPy-user] optimize.leastsq

Bruce Southey bsouthey@gmail....
Wed Jan 21 11:27:39 CST 2009

David Trethewey wrote:
> Robert Kern wrote:
>> On Tue, Jan 20, 2009 at 09:34,  <josef.pktd@gmail.com> wrote:
>>> On Tue, Jan 20, 2009 at 6:30 AM, David Trethewey <dlrt2@ast.cam.ac.uk> wrote:
>>>> I'm using the following code to fit a gaussian to a histogram of some data.
>>>> #fit gaussian
>>>>    fitfunc = lambda p, x: (p[0]**2)*exp(-(x-p[1])**2/(2*p[2]**2))  #
>>>> Target function
>>>>    errfunc = lambda p, x, y: fitfunc(p,x) -y          # Distance to the
>>>> target function
>>>>    doublegauss = lambda q,x: (q[0]**2)*exp(-(x-q[1])**2/(2*q[2]**2)) +
>>>> (q[3]**2)*exp(-(x-q[4])**2/(2*q[5]**2))
>>>>    doublegausserr = lambda q,x,y: doublegauss(q,x) - y
>>>>    # initial guess
>>>>    p0 = [10.0,-2,0.5]
>>>>    # find parameters of single gaussian
>>>>    p1,success  = optimize.leastsq(errfunc, p0[:], args =
>>>> (hista[1],hista[0]))
>>>>    errors_sq = errfunc(p1,hista[1],hista[0])**2
>>>> I have the error message
>>>> Traceback (most recent call last):
>>>>  File "M31FeHfit_totalw108.py", line 116, in <module>
>>>>   p1,success  = optimize.leastsq(errfunc, p0[:], args =
>>>> (hista[1],hista[0]))
>>>>  File "C:\Python25\lib\site-packages\scipy\optimize\minpack.py", line
>>>> 264, in leastsq
>>>>   m = check_func(func,x0,args,n)[0]
>>>>  File "C:\Python25\lib\site-packages\scipy\optimize\minpack.py", line
>>>> 11, in check_func
>>>>   res = atleast_1d(thefunc(*((x0[:numinputs],)+args)))
>>>>  File "M31FeHfit_totalw108.py", line 110, in <lambda>
>>>>   errfunc = lambda p, x, y: fitfunc(p,x) -y          # Distance to the
>>>> target function
>>>> ValueError: shape mismatch: objects cannot be broadcast to a single shape
>>>> Anyone know why this happens? Curiously I have had it work before but
>>>> not with my current versions of scipy and python etc.
>>>> David
>>> Check the dimensions hista[1],hista[0]. I can run your part of the
>>> code without problems.
>>> If you want to estimate the parameters of a (parametric) distribution,
>>> then using maximum likelihood estimation would be more appropriate
>>> than using least squares on the histogram.
>> Right. You can't just take the value of the PDF and compare it to the
>> (density-normalized) value of the histogram. You have to integrate the
>> PDF over each bin and compare that value to the mass-normalized value
>> of the histogram. Least-squares still isn't quite appropriate for this
>> task, not least because the amount of weight that you should apply to
>> each data point is non-uniform.
>> If you are doing the histogramming yourself from the raw data, you
>> might be better off doing a maximum likelihood fit on the raw data
>> like the .fit() method of the rv_continuous distribution objects in
>> scipy.tats.
>> If the data you have is already pre-histogrammed or discretized,
>> though, you need a different formulation of ML. For given parameters,
>> integrate the PDF over the bins of your histogram. This will give you
>> the probability of a single sample falling into each bin. If you have
>> N samples from this distribution (N being the number of data points
>> that went into the real histogram), this defines a multinomial
>> distribution over the bins. You can evaluate the log-likelihood of
>> getting your real histogram given those PDF parameters using the
>> multinomial distribution.
>> I've actually had a far bit of success with the latter when estimating
>> Weibull distributions when the typical techniques failed to be robust.
> I am doing the histogramming from the raw data, so sounds like a maximum 
> likelihood fit would be better. What I have is a series of velocity and 
> Fe/H measurements for a series of stars in the Andromeda galaxy, and the 
> idea is to find a gaussian and double gaussian fit, and have a look to 
> see whether the double gaussian is significantly better, to detect 
> whether there are two distinct populations within the stars.
> David
Being out my area, but my question is reasoning for needing a double 
gaussian fit?

As Josef said, you can fit a mixture model 
(http://en.wikipedia.org/wiki/Mixture_mode) in which case you can 
construct a test based on treating the single gaussian as a special case 
with one mixture. You can use something like BIC 
(http://en.wikipedia.org/wiki/Bayesian_information_criterion) to compare 
the two to allow for differences in parameters. Note the assumptions of 
the likelihood ratio test may not apply.

Alternatively, you can model heterogeneous variance with a mixed model 
(http://en.wikipedia.org/wiki/Mixed_model) approach is very flexible 
such as modeling that different types of stars have different variances.

Also you can allow for non-gaussian models with the above as well...


More information about the SciPy-user mailing list