[SciPy-user] optimize.leastsq
David Trethewey
dlrt2@ast.cam.ac...
Wed Jan 21 04:02:14 CST 2009
So what I'm trying to work out now is how to use the .fit() method of
rv_continuous for a single gaussian and a double gaussian.
David
David Trethewey wrote:
> Robert Kern wrote:
>
>> On Tue, Jan 20, 2009 at 09:34, <josef.pktd@gmail.com> wrote:
>>
>>
>>> On Tue, Jan 20, 2009 at 6:30 AM, David Trethewey <dlrt2@ast.cam.ac.uk> wrote:
>>>
>>>
>>>> I'm using the following code to fit a gaussian to a histogram of some data.
>>>>
>>>> #fit gaussian
>>>> fitfunc = lambda p, x: (p[0]**2)*exp(-(x-p[1])**2/(2*p[2]**2)) #
>>>> Target function
>>>> errfunc = lambda p, x, y: fitfunc(p,x) -y # Distance to the
>>>> target function
>>>> doublegauss = lambda q,x: (q[0]**2)*exp(-(x-q[1])**2/(2*q[2]**2)) +
>>>> (q[3]**2)*exp(-(x-q[4])**2/(2*q[5]**2))
>>>> doublegausserr = lambda q,x,y: doublegauss(q,x) - y
>>>> # initial guess
>>>> p0 = [10.0,-2,0.5]
>>>> # find parameters of single gaussian
>>>> p1,success = optimize.leastsq(errfunc, p0[:], args =
>>>> (hista[1],hista[0]))
>>>> errors_sq = errfunc(p1,hista[1],hista[0])**2
>>>>
>>>>
>>>> I have the error message
>>>>
>>>> Traceback (most recent call last):
>>>> File "M31FeHfit_totalw108.py", line 116, in <module>
>>>> p1,success = optimize.leastsq(errfunc, p0[:], args =
>>>> (hista[1],hista[0]))
>>>> File "C:\Python25\lib\site-packages\scipy\optimize\minpack.py", line
>>>> 264, in leastsq
>>>> m = check_func(func,x0,args,n)[0]
>>>> File "C:\Python25\lib\site-packages\scipy\optimize\minpack.py", line
>>>> 11, in check_func
>>>> res = atleast_1d(thefunc(*((x0[:numinputs],)+args)))
>>>> File "M31FeHfit_totalw108.py", line 110, in <lambda>
>>>> errfunc = lambda p, x, y: fitfunc(p,x) -y # Distance to the
>>>> target function
>>>> ValueError: shape mismatch: objects cannot be broadcast to a single shape
>>>>
>>>> Anyone know why this happens? Curiously I have had it work before but
>>>> not with my current versions of scipy and python etc.
>>>>
>>>> David
>>>>
>>>>
>>> Check the dimensions hista[1],hista[0]. I can run your part of the
>>> code without problems.
>>>
>>> If you want to estimate the parameters of a (parametric) distribution,
>>> then using maximum likelihood estimation would be more appropriate
>>> than using least squares on the histogram.
>>>
>>>
>> Right. You can't just take the value of the PDF and compare it to the
>> (density-normalized) value of the histogram. You have to integrate the
>> PDF over each bin and compare that value to the mass-normalized value
>> of the histogram. Least-squares still isn't quite appropriate for this
>> task, not least because the amount of weight that you should apply to
>> each data point is non-uniform.
>>
>> If you are doing the histogramming yourself from the raw data, you
>> might be better off doing a maximum likelihood fit on the raw data
>> like the .fit() method of the rv_continuous distribution objects in
>> scipy.tats.
>>
>> If the data you have is already pre-histogrammed or discretized,
>> though, you need a different formulation of ML. For given parameters,
>> integrate the PDF over the bins of your histogram. This will give you
>> the probability of a single sample falling into each bin. If you have
>> N samples from this distribution (N being the number of data points
>> that went into the real histogram), this defines a multinomial
>> distribution over the bins. You can evaluate the log-likelihood of
>> getting your real histogram given those PDF parameters using the
>> multinomial distribution.
>>
>> I've actually had a far bit of success with the latter when estimating
>> Weibull distributions when the typical techniques failed to be robust.
>>
>>
>>
> I am doing the histogramming from the raw data, so sounds like a maximum
> likelihood fit would be better. What I have is a series of velocity and
> Fe/H measurements for a series of stars in the Andromeda galaxy, and the
> idea is to find a gaussian and double gaussian fit, and have a look to
> see whether the double gaussian is significantly better, to detect
> whether there are two distinct populations within the stars.
>
> David
>
> David
> _______________________________________________
> SciPy-user mailing list
> SciPy-user@scipy.org
> http://projects.scipy.org/mailman/listinfo/scipy-user
>
More information about the SciPy-user
mailing list