[SciPy-user] curve_fit failure when supplying uncertainties

Anne Archibald peridot.faceted@gmail....
Sat Mar 7 19:52:05 CST 2009


2009/3/7 ElMickerino <elmickerino@hotmail.com>:
>
> Great, thanks so much for your help.  I seem to recall that empty bins
> require special handling when fitting histograms, so this makes sense.  I
> was using variance of the bin content, which are assumed to poisson
> distributed (ie. if bin 'i' has content N_i, then var(bin_i) = sqrt(N_i) ).
> This works fine for large N, but I believe that asymmetric errors must be
> used for small bin content.
>
> Is there a convenient way of supplying asymmetric errors to a least square
> fit in scipy?

If you get into small-number statistics like this, least-squares
fitting will no longer give you the right answer. If the issue is just
a few bins with low values you might be able to get away with
discarding them from the fit, but that's very iffy statistically. You
can do a maximum-likelihood fit directly, by constructing the
likelihood (-log probability of your data) using scipy.stats.poisson.
Moreover, if you're doing this to a histogram, you may want to think
about avoiding the binning entirely and fitting the unbinned data
directly, possibly going for a maximum probability on the K-S test
(also in scipy, though you may want to look at how it works and write
your own for fitting purposes).

However you do it I strongly advise doing a round-trip: take the
best-fit model, generate some fake data sets, and fit them using the
same code. Not only does this check that your fitting code does what
you think it does, it also gives you an estimate of the errors on the
fitted parameters.

Anne

> Thanks again,
> Michael
>
>
> josef.pktd wrote:
>>
>> On Sat, Mar 7, 2009 at 7:18 PM, ElMickerino <elmickerino@hotmail.com>
>> wrote:
>>>
>>> I think I've identified a problem with curve_fit which occurs when one
>>> attempts to fit normally distributed data with a gaussian.  From the
>>> documentation of curve_fit, it appears that 'sigma' should be the
>>> uncertainties on the y-values of the data; however, the following example
>>> (see attached code) should make it clear that there's a problem with
>>> this.
>>>
>>> My best guess is that the sigma are actually weights (=1.0/sigma).  Can
>>> anyone confirm or deny this?  Seems like from the name it should be
>>> uncertainties but from the behavior of the code it appears otherwise.
>>>
>>> Also, I was wondering if there's a way to supply asymmetric errors to
>>> curve_fit (or for that matter, to leastsqr or any wrapper thereof).
>>>
>>> Thanks very much,
>>> Michael
>>>
>>> http://www.nabble.com/file/p22380378/testFit.py testFit.py
>>> --
>>> View this message in context:
>>> http://www.nabble.com/curve_fit-failure-when-supplying-uncertainties-tp22380378p22380378.html
>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>
>>> _______________________________________________
>>> SciPy-user mailing list
>>> SciPy-user@scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>>
>> sigma is the vector of standard deviations for the observations. The
>> weights in the least squares objective function are 1/sigma.
>> Essentially it is a heteroscedastic non-linear generalized least
>> squares estimation.
>>
>> In your case err contains zeros, which means the variance is zero and
>> the weight 1/sigma for these observations would be infinite. curve_fit
>> cannot handle zero standard deviation. Infinite standard deviation
>> works because then the weight becomes zero.
>>
>> If you force your err to be strictly positive, then this works, e.g.
>>
>>>>> curve_fit(gaus, centers, data, sigma=1e-6+err)
>> (array([  1.91520894e+03,   1.68957830e-01,   6.19128687e-01]),
>> array([[  3.03590409e-09,   5.11571694e-18,  -6.69243949e-18],
>>        [  5.11571694e-18,   4.43479563e-16,   6.18126633e-17],
>>        [ -6.69243949e-18,   6.18126633e-17,   3.02641036e-17]]))
>>
>>
>> I haven't looked closely at the rest of your file, so I'm not sure
>> about the interpretation of your err
>>
>> Josef
>> _______________________________________________
>> SciPy-user mailing list
>> SciPy-user@scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
>
> --
> View this message in context: http://www.nabble.com/curve_fit-failure-when-supplying-uncertainties-tp22380378p22394056.html
> Sent from the Scipy-User mailing list archive at Nabble.com.
>
> _______________________________________________
> SciPy-user mailing list
> SciPy-user@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


More information about the SciPy-user mailing list