[SciPy-User] Can I "fix" scale=1 when fitting distributions?

josef.pktd@gmai... josef.pktd@gmai...
Thu Apr 15 10:34:28 CDT 2010


On Wed, Apr 14, 2010 at 12:56 AM,  <josef.pktd@gmail.com> wrote:
> On Wed, Apr 14, 2010 at 12:52 AM,  <josef.pktd@gmail.com> wrote:
>> On Wed, Apr 14, 2010 at 12:09 AM, David Ho <itsdho@ucla.edu> wrote:
>>> Actually, being able to "fix" scale=1 would be very useful for me.
>>>
>>> The reason I'm trying to fit a von mises distribution to my data is to find
>>> "kappa", a measure of the concentration of the data.
>>>
>>> On wikipedia, I see that the von mises distribution only really has 2
>>> parameters: mu (the mean), and kappa (1/kappa being analogous to the
>>> variance).
>>> When I use vonmises.fit(), though, I get 3 parameters: kappa, mu, and a
>>> "scale parameter", respectively.
>>> However, I don't think the scale parameter for the von mises distribution is
>>> really independent of kappa, is it? (Am I understanding this correctly?)
>>> (Using the normal distribution as a comparison, I think the "scale
>>> parameter" for a normal distribution certainly isn't independent of sigma,
>>> and norm.fit() only returns mu and sigma. This makes a lot more sense to
>>> me.)
>>
>> for normal  distribution loc=mean and scale  = sqrt(variance)  and has
>> no shape parameter.
>>
>> essentially all distributions are transformed  y = (x-mean)/scale
>> where x is the original data, and y is the standardized data, the
>> actual _pdf, _cdf, ... are for the standard version of the
>> distribution,
>> it's a generic framework, but doesn't make sense in all cases or
>> applications mainly when we want to fix the support of the
>> distribution
>>
>>>
>>> I'm basically trying to use kappa as a measure of the "width" of the
>>> distribution, so the extra degree of freedom introduced by the scale
>>> parameter is problematic for me. For two distributions that are
>>> superficially very similar in "width", I might get wildly varying values of
>>> kappa.
>>>
>>> For example, I once got fitted values of kappa=1, and kappa=400 for two
>>> distributions that looked very similar in width.
>>> I thought I must have been doing something wrong... until I saw the scale
>>> parameters were around 1 and 19, respectively.
>>> Plotting the fitted distributions:
>>>>>> xs = numpy.linspace(-numpy.pi, numpy.pi, 361)
>>>>>> plot(xs, scipy.stats.distributions.vonmises.pdf(xs, 1, loc=0, scale=1))
>>>>>> plot(xs, scipy.stats.distributions.vonmises.pdf(xs, 400, loc=0,
>>>>>> scale=19))
>>
>> one idea is to check the implied moments, eg. vonmises.stats(1, loc=0,
>> scale=1, moment='mvsk'))
>> to see whether the implied mean, variance, skew, kurtosis are similar
>> in two different parameterizations.
>> Warning skew, kurtosis is incorrect for a few distributions, I don't
>> know about vonmises.
>>
>>>
>>> What I'd really like to do is "force" scale=1 when I perform the fitting.
>>> (In fact, it would be nice to even force the mean of the fitted distribution
>>> to a precalculated value, as well. I really only want one degree of freedom
>>> for the fitting algorithm -- I just want it to explore different values of
>>> kappa.)
>>>
>>> Is there any way to do this?
>>
>> It's not yet possible in scipy, but easy to write, essentially copy
>> the existing fit and nnlf methods and fix loc and scale and call the
>> optimization with only one argument.
>>
>> I have a more than one year old ticket:
>> http://projects.scipy.org/scipy/ticket/832
>>
>> It might be possible that it works immediately if you monkey patch
>> rv_continuous  by adding
>> nnlf_fr and fit_fr
>> in
>> http://mail.scipy.org/pipermail/scipy-user/2009-February/019968.html
>>
>> scipy.stats.rv_continuous.nnlf_fr = nnlfr
>> scipy.stats.rv_continuous.fit_fr = fit_fr
>
> correction
> this should instead refer to
> scipy.stats.distributions.rv_continuous

It looks like it works, checked only on a few examples, see attachment

Josef

>
> Josef
>>
>> and call
>> vonmises.fit_fr(data, frozen=[np.nan, 0.0, 1.0])
>>
>> It's worth a try, but if you only need vonmises it might also be easy
>> to use scipy.optimize.fmin on a function that wraps vonmises._nnlf and
>> fixes loc, scale.
>> To give you an idea, I didn't check what the correct function arguments are ::
>>
>> def mynnlf(kappa):
>>    loc = 0
>>    scale = 1
>>    return vonmises.nnlf(kappa, loc, scale)
>>
>> scipy.optimize.fmin(mynnlf, startingvalueforkappa)
>>
>> I could check this for your case at the end of the week.
>>
>> If you have an opinion for a good generic API for fit_semifrozen, let me know.
>> I hope this helps, and please keep me informed. I would like to make
>> at least a cookbook recipe out of it.
>>
>> Josef
>>
>>
>>> Thanks for your help,
>>>
>>> --David Ho
>>>
>>>
>>>> On Tue, Apr 13, 2010 at 3:13 PM,  <josef.pktd@gmail.com> wrote:
>>>> > In the current fit version, loc (and with it the support of the
>>>> > distribution) and scale are always estimated. In some cases this is
>>>> > not desired.
>>>> > You are transforming the original data to fit into the standard
>>>> > distribution with loc=0, scale=1 . Do you get reasonable estimates for
>>>> > loc and scale in this case?
>>>> > If not, then there is another patch or enhanced fit function that
>>>> > could take loc and scale as fixed.
>>>> >
>>>> > I will look at some details in your function later, especially I'm
>>>> > curious how the circular statistics works.
>>>> >
>>>> > Thanks for the example, maybe I can stop ignoring vonmises.
>>>> >
>>>> > Josef
>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User@scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>>>
>>
>
-------------- next part --------------
'''patching scipy to fit distributions with some fixed/frozen parameters

'''

#import numpy
import numpy as np
#import matplotlib.pyplot as mp
#import scipy.stats
from scipy import stats, optimize


########## patching scipy

stats.distributions.vonmises.a = -np.pi
stats.distributions.vonmises.b = np.pi



def _fitstart(self, x):
    # method of moment estimator as starting values, not verified
    # with literature
    loc = np.min([x.min(),0])
    a = 4/stats.skew(x)**2
    scale = np.std(x) / np.sqrt(a)
    return (a, loc, scale)

def nnlf_fr(self, thetash, x, frmask):
    # new frozen version
    # - sum (log pdf(x, theta),axis=0)
    #   where theta are the parameters (including loc and scale)
    #
    try:
        if frmask != None:
            theta = frmask.copy()
            theta[np.isnan(frmask)] = thetash
        else:
            theta = thetash
        loc = theta[-2]
        scale = theta[-1]
        args = tuple(theta[:-2])
    except IndexError:
        raise ValueError, "Not enough input arguments."
    if not self._argcheck(*args) or scale <= 0:
        return np.inf
    x = np.array((x-loc) / scale)
    cond0 = (x <= self.a) | (x >= self.b)
    if (np.any(cond0)):
        return np.inf
    else:
        N = len(x)
        #raise ValueError
        return self._nnlf(x, *args) + N*np.log(scale)

def fit_fr(self, data, *args, **kwds):
    loc0, scale0 = map(kwds.get, ['loc', 'scale'],[0.0, 1.0])
    Narg = len(args)
    
    if Narg == 0 and hasattr(self, '_fitstart'):
        x0 = self._fitstart(data)
    elif Narg > self.numargs:
        raise ValueError, "Too many input arguments."
    else:
        args += (1.0,)*(self.numargs-Narg)
        # location and scale are at the end
        x0 = args + (loc0, scale0)
    
    if 'frozen' in kwds:
        frmask = np.array(kwds['frozen'])
        if len(frmask) != self.numargs+2:
            raise ValueError, "Incorrect number of frozen arguments."
        else:
            # keep starting values for not frozen parameters
            x0  = np.array(x0)[np.isnan(frmask)]
    else:
        frmask = None
        
    #print x0
    #print frmask
    return optimize.fmin(self.nnlf_fr, x0,
                args=(np.ravel(data), frmask), disp=0)

stats.distributions.rv_continuous.fit_fr = fit_fr
stats.distributions.rv_continuous.nnlf_fr = nnlf_fr

########## end patching scipy

for nobs in [20000, 1000, 100]:
    x = stats.vonmises.rvs(1.23, loc=0, scale=1, size=nobs)
    print '\nnobs:', nobs
    print 'true parameter'
    print '1.23, loc=0, scale=1'
    print 'unconstraint'
    print stats.vonmises.fit(x)
    print stats.vonmises.fit_fr(x, frozen=[np.nan, np.nan, np.nan])
    print 'with fixed loc and scale'
    print stats.vonmises.fit_fr(x, frozen=[np.nan, 0.0, 1.0])


distr = stats.gamma
arg, loc, scale = 2.5, 0., 20.

for nobs in [20000, 1000, 100]:
    x = distr.rvs(arg, loc=loc, scale=scale, size=nobs)
    print '\nnobs:', nobs
    print 'true parameter'
    print '%f, loc=%f, scale=%f' % (arg, loc, scale)
    print 'unconstraint'
    print distr.fit(x)
    print distr.fit_fr(x, frozen=[np.nan, np.nan, np.nan])
    print 'with fixed loc and scale'
    print distr.fit_fr(x, frozen=[np.nan, 0.0, 1.0])
    print 'with fixed loc'
    print distr.fit_fr(x, frozen=[np.nan, 0.0, np.nan])


More information about the SciPy-User mailing list