[SciPy-User] Can I "fix" scale=1 when fitting distributions?
josef.pktd@gmai...
josef.pktd@gmai...
Thu Apr 15 10:34:28 CDT 2010
On Wed, Apr 14, 2010 at 12:56 AM, <josef.pktd@gmail.com> wrote:
> On Wed, Apr 14, 2010 at 12:52 AM, <josef.pktd@gmail.com> wrote:
>> On Wed, Apr 14, 2010 at 12:09 AM, David Ho <itsdho@ucla.edu> wrote:
>>> Actually, being able to "fix" scale=1 would be very useful for me.
>>>
>>> The reason I'm trying to fit a von mises distribution to my data is to find
>>> "kappa", a measure of the concentration of the data.
>>>
>>> On wikipedia, I see that the von mises distribution only really has 2
>>> parameters: mu (the mean), and kappa (1/kappa being analogous to the
>>> variance).
>>> When I use vonmises.fit(), though, I get 3 parameters: kappa, mu, and a
>>> "scale parameter", respectively.
>>> However, I don't think the scale parameter for the von mises distribution is
>>> really independent of kappa, is it? (Am I understanding this correctly?)
>>> (Using the normal distribution as a comparison, I think the "scale
>>> parameter" for a normal distribution certainly isn't independent of sigma,
>>> and norm.fit() only returns mu and sigma. This makes a lot more sense to
>>> me.)
>>
>> for normal distribution loc=mean and scale = sqrt(variance) and has
>> no shape parameter.
>>
>> essentially all distributions are transformed y = (x-mean)/scale
>> where x is the original data, and y is the standardized data, the
>> actual _pdf, _cdf, ... are for the standard version of the
>> distribution,
>> it's a generic framework, but doesn't make sense in all cases or
>> applications mainly when we want to fix the support of the
>> distribution
>>
>>>
>>> I'm basically trying to use kappa as a measure of the "width" of the
>>> distribution, so the extra degree of freedom introduced by the scale
>>> parameter is problematic for me. For two distributions that are
>>> superficially very similar in "width", I might get wildly varying values of
>>> kappa.
>>>
>>> For example, I once got fitted values of kappa=1, and kappa=400 for two
>>> distributions that looked very similar in width.
>>> I thought I must have been doing something wrong... until I saw the scale
>>> parameters were around 1 and 19, respectively.
>>> Plotting the fitted distributions:
>>>>>> xs = numpy.linspace(-numpy.pi, numpy.pi, 361)
>>>>>> plot(xs, scipy.stats.distributions.vonmises.pdf(xs, 1, loc=0, scale=1))
>>>>>> plot(xs, scipy.stats.distributions.vonmises.pdf(xs, 400, loc=0,
>>>>>> scale=19))
>>
>> one idea is to check the implied moments, eg. vonmises.stats(1, loc=0,
>> scale=1, moment='mvsk'))
>> to see whether the implied mean, variance, skew, kurtosis are similar
>> in two different parameterizations.
>> Warning skew, kurtosis is incorrect for a few distributions, I don't
>> know about vonmises.
>>
>>>
>>> What I'd really like to do is "force" scale=1 when I perform the fitting.
>>> (In fact, it would be nice to even force the mean of the fitted distribution
>>> to a precalculated value, as well. I really only want one degree of freedom
>>> for the fitting algorithm -- I just want it to explore different values of
>>> kappa.)
>>>
>>> Is there any way to do this?
>>
>> It's not yet possible in scipy, but easy to write, essentially copy
>> the existing fit and nnlf methods and fix loc and scale and call the
>> optimization with only one argument.
>>
>> I have a more than one year old ticket:
>> http://projects.scipy.org/scipy/ticket/832
>>
>> It might be possible that it works immediately if you monkey patch
>> rv_continuous by adding
>> nnlf_fr and fit_fr
>> in
>> http://mail.scipy.org/pipermail/scipy-user/2009-February/019968.html
>>
>> scipy.stats.rv_continuous.nnlf_fr = nnlfr
>> scipy.stats.rv_continuous.fit_fr = fit_fr
>
> correction
> this should instead refer to
> scipy.stats.distributions.rv_continuous
It looks like it works, checked only on a few examples, see attachment
Josef
>
> Josef
>>
>> and call
>> vonmises.fit_fr(data, frozen=[np.nan, 0.0, 1.0])
>>
>> It's worth a try, but if you only need vonmises it might also be easy
>> to use scipy.optimize.fmin on a function that wraps vonmises._nnlf and
>> fixes loc, scale.
>> To give you an idea, I didn't check what the correct function arguments are ::
>>
>> def mynnlf(kappa):
>> loc = 0
>> scale = 1
>> return vonmises.nnlf(kappa, loc, scale)
>>
>> scipy.optimize.fmin(mynnlf, startingvalueforkappa)
>>
>> I could check this for your case at the end of the week.
>>
>> If you have an opinion for a good generic API for fit_semifrozen, let me know.
>> I hope this helps, and please keep me informed. I would like to make
>> at least a cookbook recipe out of it.
>>
>> Josef
>>
>>
>>> Thanks for your help,
>>>
>>> --David Ho
>>>
>>>
>>>> On Tue, Apr 13, 2010 at 3:13 PM, <josef.pktd@gmail.com> wrote:
>>>> > In the current fit version, loc (and with it the support of the
>>>> > distribution) and scale are always estimated. In some cases this is
>>>> > not desired.
>>>> > You are transforming the original data to fit into the standard
>>>> > distribution with loc=0, scale=1 . Do you get reasonable estimates for
>>>> > loc and scale in this case?
>>>> > If not, then there is another patch or enhanced fit function that
>>>> > could take loc and scale as fixed.
>>>> >
>>>> > I will look at some details in your function later, especially I'm
>>>> > curious how the circular statistics works.
>>>> >
>>>> > Thanks for the example, maybe I can stop ignoring vonmises.
>>>> >
>>>> > Josef
>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User@scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>>>
>>
>
-------------- next part --------------
'''patching scipy to fit distributions with some fixed/frozen parameters
'''
#import numpy
import numpy as np
#import matplotlib.pyplot as mp
#import scipy.stats
from scipy import stats, optimize
########## patching scipy
stats.distributions.vonmises.a = -np.pi
stats.distributions.vonmises.b = np.pi
def _fitstart(self, x):
# method of moment estimator as starting values, not verified
# with literature
loc = np.min([x.min(),0])
a = 4/stats.skew(x)**2
scale = np.std(x) / np.sqrt(a)
return (a, loc, scale)
def nnlf_fr(self, thetash, x, frmask):
# new frozen version
# - sum (log pdf(x, theta),axis=0)
# where theta are the parameters (including loc and scale)
#
try:
if frmask != None:
theta = frmask.copy()
theta[np.isnan(frmask)] = thetash
else:
theta = thetash
loc = theta[-2]
scale = theta[-1]
args = tuple(theta[:-2])
except IndexError:
raise ValueError, "Not enough input arguments."
if not self._argcheck(*args) or scale <= 0:
return np.inf
x = np.array((x-loc) / scale)
cond0 = (x <= self.a) | (x >= self.b)
if (np.any(cond0)):
return np.inf
else:
N = len(x)
#raise ValueError
return self._nnlf(x, *args) + N*np.log(scale)
def fit_fr(self, data, *args, **kwds):
loc0, scale0 = map(kwds.get, ['loc', 'scale'],[0.0, 1.0])
Narg = len(args)
if Narg == 0 and hasattr(self, '_fitstart'):
x0 = self._fitstart(data)
elif Narg > self.numargs:
raise ValueError, "Too many input arguments."
else:
args += (1.0,)*(self.numargs-Narg)
# location and scale are at the end
x0 = args + (loc0, scale0)
if 'frozen' in kwds:
frmask = np.array(kwds['frozen'])
if len(frmask) != self.numargs+2:
raise ValueError, "Incorrect number of frozen arguments."
else:
# keep starting values for not frozen parameters
x0 = np.array(x0)[np.isnan(frmask)]
else:
frmask = None
#print x0
#print frmask
return optimize.fmin(self.nnlf_fr, x0,
args=(np.ravel(data), frmask), disp=0)
stats.distributions.rv_continuous.fit_fr = fit_fr
stats.distributions.rv_continuous.nnlf_fr = nnlf_fr
########## end patching scipy
for nobs in [20000, 1000, 100]:
x = stats.vonmises.rvs(1.23, loc=0, scale=1, size=nobs)
print '\nnobs:', nobs
print 'true parameter'
print '1.23, loc=0, scale=1'
print 'unconstraint'
print stats.vonmises.fit(x)
print stats.vonmises.fit_fr(x, frozen=[np.nan, np.nan, np.nan])
print 'with fixed loc and scale'
print stats.vonmises.fit_fr(x, frozen=[np.nan, 0.0, 1.0])
distr = stats.gamma
arg, loc, scale = 2.5, 0., 20.
for nobs in [20000, 1000, 100]:
x = distr.rvs(arg, loc=loc, scale=scale, size=nobs)
print '\nnobs:', nobs
print 'true parameter'
print '%f, loc=%f, scale=%f' % (arg, loc, scale)
print 'unconstraint'
print distr.fit(x)
print distr.fit_fr(x, frozen=[np.nan, np.nan, np.nan])
print 'with fixed loc and scale'
print distr.fit_fr(x, frozen=[np.nan, 0.0, 1.0])
print 'with fixed loc'
print distr.fit_fr(x, frozen=[np.nan, 0.0, np.nan])
More information about the SciPy-User
mailing list