[SciPy-user] optimize.leastsq
Bruce Southey
bsouthey@gmail....
Wed Jan 21 11:27:39 CST 2009
David Trethewey wrote:
> Robert Kern wrote:
>
>> On Tue, Jan 20, 2009 at 09:34, <josef.pktd@gmail.com> wrote:
>>
>>
>>> On Tue, Jan 20, 2009 at 6:30 AM, David Trethewey <dlrt2@ast.cam.ac.uk> wrote:
>>>
>>>
>>>> I'm using the following code to fit a gaussian to a histogram of some data.
>>>>
>>>> #fit gaussian
>>>> fitfunc = lambda p, x: (p[0]**2)*exp(-(x-p[1])**2/(2*p[2]**2)) #
>>>> Target function
>>>> errfunc = lambda p, x, y: fitfunc(p,x) -y # Distance to the
>>>> target function
>>>> doublegauss = lambda q,x: (q[0]**2)*exp(-(x-q[1])**2/(2*q[2]**2)) +
>>>> (q[3]**2)*exp(-(x-q[4])**2/(2*q[5]**2))
>>>> doublegausserr = lambda q,x,y: doublegauss(q,x) - y
>>>> # initial guess
>>>> p0 = [10.0,-2,0.5]
>>>> # find parameters of single gaussian
>>>> p1,success = optimize.leastsq(errfunc, p0[:], args =
>>>> (hista[1],hista[0]))
>>>> errors_sq = errfunc(p1,hista[1],hista[0])**2
>>>>
>>>>
>>>> I have the error message
>>>>
>>>> Traceback (most recent call last):
>>>> File "M31FeHfit_totalw108.py", line 116, in <module>
>>>> p1,success = optimize.leastsq(errfunc, p0[:], args =
>>>> (hista[1],hista[0]))
>>>> File "C:\Python25\lib\site-packages\scipy\optimize\minpack.py", line
>>>> 264, in leastsq
>>>> m = check_func(func,x0,args,n)[0]
>>>> File "C:\Python25\lib\site-packages\scipy\optimize\minpack.py", line
>>>> 11, in check_func
>>>> res = atleast_1d(thefunc(*((x0[:numinputs],)+args)))
>>>> File "M31FeHfit_totalw108.py", line 110, in <lambda>
>>>> errfunc = lambda p, x, y: fitfunc(p,x) -y # Distance to the
>>>> target function
>>>> ValueError: shape mismatch: objects cannot be broadcast to a single shape
>>>>
>>>> Anyone know why this happens? Curiously I have had it work before but
>>>> not with my current versions of scipy and python etc.
>>>>
>>>> David
>>>>
>>>>
>>> Check the dimensions hista[1],hista[0]. I can run your part of the
>>> code without problems.
>>>
>>> If you want to estimate the parameters of a (parametric) distribution,
>>> then using maximum likelihood estimation would be more appropriate
>>> than using least squares on the histogram.
>>>
>>>
>> Right. You can't just take the value of the PDF and compare it to the
>> (density-normalized) value of the histogram. You have to integrate the
>> PDF over each bin and compare that value to the mass-normalized value
>> of the histogram. Least-squares still isn't quite appropriate for this
>> task, not least because the amount of weight that you should apply to
>> each data point is non-uniform.
>>
>> If you are doing the histogramming yourself from the raw data, you
>> might be better off doing a maximum likelihood fit on the raw data
>> like the .fit() method of the rv_continuous distribution objects in
>> scipy.tats.
>>
>> If the data you have is already pre-histogrammed or discretized,
>> though, you need a different formulation of ML. For given parameters,
>> integrate the PDF over the bins of your histogram. This will give you
>> the probability of a single sample falling into each bin. If you have
>> N samples from this distribution (N being the number of data points
>> that went into the real histogram), this defines a multinomial
>> distribution over the bins. You can evaluate the log-likelihood of
>> getting your real histogram given those PDF parameters using the
>> multinomial distribution.
>>
>> I've actually had a far bit of success with the latter when estimating
>> Weibull distributions when the typical techniques failed to be robust.
>>
>>
>>
> I am doing the histogramming from the raw data, so sounds like a maximum
> likelihood fit would be better. What I have is a series of velocity and
> Fe/H measurements for a series of stars in the Andromeda galaxy, and the
> idea is to find a gaussian and double gaussian fit, and have a look to
> see whether the double gaussian is significantly better, to detect
> whether there are two distinct populations within the stars.
>
> David
>
>
>
Being out my area, but my question is reasoning for needing a double
gaussian fit?
As Josef said, you can fit a mixture model
(http://en.wikipedia.org/wiki/Mixture_mode) in which case you can
construct a test based on treating the single gaussian as a special case
with one mixture. You can use something like BIC
(http://en.wikipedia.org/wiki/Bayesian_information_criterion) to compare
the two to allow for differences in parameters. Note the assumptions of
the likelihood ratio test may not apply.
Alternatively, you can model heterogeneous variance with a mixed model
(http://en.wikipedia.org/wiki/Mixed_model) approach is very flexible
such as modeling that different types of stars have different variances.
Also you can allow for non-gaussian models with the above as well...
Bruce
More information about the SciPy-user
mailing list