# [SciPy-Dev] Subversion scipy.stats irregular problem with source code example

josef.pktd@gmai... josef.pktd@gmai...
Mon Oct 11 15:10:06 CDT 2010

```On Mon, Oct 11, 2010 at 3:45 PM, James Phillips <zunzun@zunzun.com> wrote:
> The genetic algorithm approach is not working as a general solution to
> the problem of finding starting parameters for fmin() for statistical
> distributions, presumably due to extreme parameter sensitivity.  I do
> not see a general solution to the problem given these results.  See
> the attached Python file, also copied below.
>
> My results:
>
> Digits of precision test for the beta distribution
> nnlf native    = inf
> nnlf 16 digits = 10.14091764
> nnlf 15 digits = 10.3222074111
> nnlf 14 digits = 10.977829575
> nnlf 13 digits = inf
> nnlf 12 digits = 13.198954184
>
>
>     James
>
> import scipy, scipy.stats
>
> data = scipy.array([
> 3.017,2.822,2.632,2.287,2.207,2.048,
> 1.963,1.784,1.712,2.972,2.719,2.495,
> 2.070,1.969,1.768,1.677,1.479,1.387,
> 2.843,2.485,2.163,1.687,1.408,1.279,
> 1.016,0.742,0.607])
>
> # parameters
> p1 = 7.69589403034175001E-01
> p2 = 5.52884409849620395E-01
> p3 = 6.06094740472452820E-01
> p4 = 2.41090525952754753E+00
>
> print "Digits of precision test for the beta distribution"
> print "nnlf native    =", scipy.stats.beta.nnlf([p1, p2, p3, p3], data)

typo should be p4

> print "nnlf 16 digits =", scipy.stats.beta.nnlf([float("%.16E" % p1),
> float("%.16E" % p2), float("%.16E" % p3), float("%.16E" % p4)], data)
> print "nnlf 15 digits =", scipy.stats.beta.nnlf([float("%.15E" % p1),
> float("%.15E" % p2), float("%.15E" % p3), float("%.15E" % p4)], data)
> print "nnlf 14 digits =", scipy.stats.beta.nnlf([float("%.14E" % p1),
> float("%.14E" % p2), float("%.14E" % p3), float("%.14E" % p4)], data)
> print "nnlf 13 digits =", scipy.stats.beta.nnlf([float("%.13E" % p1),
> float("%.13E" % p2), float("%.13E" % p3), float("%.13E" % p4)], data)
> print "nnlf 12 digits =", scipy.stats.beta.nnlf([float("%.12E" % p1),
> float("%.12E" % p2), float("%.12E" % p3), float("%.12E" % p4)], data)

If I remember correctly, you have observations that are too close to
the upper boundary.

>>> data.max() -(p3+p4)
-4.4408920985006262e-016

If you have an observation at the boundary, the loglikelihood is inf

>>> scipy.stats.beta.nnlf([p1, p2, p3, p4], data)
10.14091764000147

reduce variance a bit, shrinks the support

>>> scipy.stats.beta.nnlf([p1, p2, p3, p4-1e-15], data)
1.#INF

I think in these cases you have to keep the boundary of the support
away from the max and min of the data. Similar in other distributions,
as I mentioned before.

If MLE doesn't work for a distribution then a global optimizer
wouldn't help either. In these cases, usually another estimation
method is recommended in the literature. For example matching
quantiles similar to your initial version.

Are all distributions that have support in the entire real line working ?

Josef

>
>
>
>
> On Fri, Oct 1, 2010 at 7:58 AM, James Phillips <zunzun@zunzun.com> wrote:
>> On Fri, Oct 1, 2010 at 7:21 AM, James Phillips <zunzun@zunzun.com> wrote:
>>>
>>> Here are current timing results fitting both loc and scale...
>>
>> I'm iterating over all continuous distributions now, and the genetic
>> algorithm results are showing which distributions can be run with loc
>> = min(data) and scale = max(data) - min(data).  With that information
>> in hand I can then speed up the overall fitting considerably by not
>> fitting those parameters.
>>
>>     James
>>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
>
```