[SciPy-User] MLE with stats.lognorm

josef.pktd@gmai... josef.pktd@gmai...
Sun Oct 9 14:18:40 CDT 2011


On Sun, Oct 9, 2011 at 11:46 AM, Christian K. <ckkart@hoc.net> wrote:
> Am 09.10.11 14:14, schrieb josef.pktd@gmail.com:
>> On Sun, Oct 9, 2011 at 8:06 AM,  <josef.pktd@gmail.com> wrote:
>>> On Sun, Oct 9, 2011 at 7:51 AM, Christian K. <ckkart@hoc.net> wrote:
>>>> Hi,
>>>>
>>>> I wonder whether I am doing something wrong or if the following is to be
>>>> expected (using sciyp 0.9):
>>>>
>>>> In [38]: from scipy import stats
>>>>
>>>> In [39]: dist = stats.lognorm(0.25,scale=200.0)
>>>>
>>>> In [40]: samples = dist.rvs(size=100)
>>>>
>>>> In [41]: print stats.lognorm.fit(samples)
>>>> C:\Python26\lib\site-packages\scipy\optimize\optimize.py:280: RuntimeWarning:
>>>> invalid value encountered in subtract
>>>>  and max(abs(fsim[0]-fsim[1:])) <= ftol):
>>>> (1.0, 158.90310231282845, 21.013288720647015)
>>>>
>>>> In [42]: print stats.lognorm.fit(samples, floc=0)
>>>> [2.2059200167655884, 0, 21.013288720647015]
>>>>
>>>> Even when fixing loc=0.0, the results from the MLE for s and scale are very
>>>> different from the input parameters. Is lognorm
>>>>
>>>> Any hints are highly appreciated.
>>>
>>> I just looked at similar cases, for the changes in scipy 0.9 and
>>> starting values, see
>>> http://projects.scipy.org/scipy/ticket/1530
>>>
>>> Essentially, you need to find better starting values and give it to fit.
>>>
>>> Can you add it to the ticket? It's not quite the same, but I guess it
>>> is also that fix_loc_scale doesn't make sense.
>
> Ok. I'll do it.
>
>>> Note, I also get many of these warnings,
>>>
>>>> invalid value encountered in subtract
>>>>  and max(abs(fsim[0]-fsim[1:])) <= ftol):
>>>
>>> they are caused when np.inf is returned for invalid arguments. In many
>>> cases optimize.fmin evaluates parameters that are not valid, but most
>>> of the time that doesn't seem to cause any problems, exept it's
>>> annoying.
>>
>> for example with starting value for loc
>>>>> print stats.lognorm.fit(x, loc=0)
>> (0.23800805074491538, 0.034900026034516723, 196.31113801786194)
>
> I see. Is there any workaround/patch to force loc=0.0? What is the
> meaning of loc anyway?

loc is the starting value for fmin, I don't remember how to specify
starting values for shape parameters, I never used it.

As in the ticket you could monkey patch the _fitstart function

>>> stats.cauchy._fitstart = lambda x:(0,1)
>>> stats.cauchy.fit(x)

or what I do to experiment with starting values is

stats.distributions.lognorm_gen._fitstart = fitstart_lognormal

where fitstart_lognormal is my own function, that takes the sample as
argument, and needs to return 3 starting values for (shape, loc, and
scale)


> I have some more observations: in case the fmin warning is shown, the
> result equals the initial guess:
>
> In [17]: stats.lognorm.fit(samples)
> /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/optimize/optimize.py:280:
> RuntimeWarning: invalid value encountered in subtract
>  and max(abs(fsim[0]-fsim[1:])) <= ftol):
> Out[17]: (1.0, 172.83866358041575, 24.677880663838486)
>
> In [18]: stats.lognorm._fitstart(samples)
> Out[18]: (1.0, 172.83866358041575, 24.677880663838486)

OK, that needs a closer look. I tried for a while with different
starting values for cauchy and my impression was that most of the time
fmin converged in spite of the warning.
My Monte Carlo experiments with some distributions look pretty good
but I didn't check yet how many of the replications have parameters
that didn't move away from the starting values.

Maybe another way of imposing constraints than just to return np.inf
for out of bounds parameters would be more robust.

Josef
>
> Christian
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


More information about the SciPy-User mailing list