[Numpy-discussion] Fitting Log normal or truncated log normal distibution to three points

Thu Jun 30 04:25:49 CDT 2011

```On Jun 30, 2011, at 10:03 AM, Sachin Kumar Sharma wrote:

> Hi,
>
> I have three points 10800, 81100, 582000.
>
> What is the easiest way of fitting a log normal and truncated log normal distribution to these three points using numpy.

The lognormal and maximum likelihood fit is available in scipy.stats. It's easier to use this than to implement your own fit in numpy, although if you can't use scipy for some reason you can have a look there on how to implement it in numpy.

The rest of this reply uses scipy.stats, so if you are only interested in numpy, please stop reading.

>>> data = [10800, 81100, 582000]
>>> import scipy.stats

Lognormal distribution

lognorm.pdf(x,s) = 1/(s*x*sqrt(2*pi)) * exp(-1/2*(log(x)/s)**2)
for x > 0, s > 0.

If log x is normally distributed with mean mu and variance sigma**2,
then x is log-normally distributed with shape paramter sigma and scale
parameter exp(mu).

>>> scipy.stats.lognorm.fit(data)
/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/optimize/optimize.py:280: RuntimeWarning: invalid value encountered in subtract
and max(abs(fsim[0]-fsim[1:])) <= ftol):
/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/optimize/optimize.py:280: RuntimeWarning: invalid value encountered in absolute
and max(abs(fsim[0]-fsim[1:])) <= ftol):
(1.0, 30618.493505379971, 117675.94879488947)
>>> scipy.stats.lognorm.fit(data, floc=0, fscale=1)
[11.405078125000022, 0, 1]

See http://docs.scipy.org/doc/scipy/reference/tutorial/stats.html for further info on how the location and scale parameter are handled, basically the x in the lognormal.pdf formula above is x = (x_actual - loc) / scale.

Now how to fit a truncated lognorm?
First note that because log(x) can only be computed for x>0, scipy.stats.lognorm is already truncated to x > 0.

>>> scipy.stats.lognorm.cdf(0, 1) # arguments: (x, s)
0.0
>>> scipy.stats.lognorm.cdf(1, 1) # arguments: (x, s)
0.5
>>> scipy.stats.lognorm.cdf(2, 1) # arguments: (x, s)
0.75589140421441725

As documented here, you can use parameters a and b to set the support of the distribution (i.e. the lower and upper truncation points):
>>> help(scipy.stats.rv_continuous)
|  a : float, optional
|      Lower bound of the support of the distribution, default is minus
|      infinity.
|  b : float, optional
|      Upper bound of the support of the distribution, default is plus
|      infinity.

However when I try to use the a, b parameters to call pdf() (as a simpler method than fit() to check if it works)  I run into the following problem:

>>> scipy.stats.lognorm.pdf(1, 1)
0.3989422804014327
>>> scipy.stats.lognorm(a=1).pdf(1, 1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: pdf() takes exactly 2 arguments (3 given)
>>> scipy.stats.lognorm(a=1).pdf(1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/stats/distributions.py", line 335, in pdf
return self.dist.pdf(x, *self.args, **self.kwds)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/stats/distributions.py", line 1113, in pdf
place(output,cond,self._pdf(*goodargs) / scale)
TypeError: _pdf() takes exactly 3 arguments (2 given)

For a distribution without parameters besides (loc, scale), setting a works:
>>> scipy.stats.norm(a=-2).pdf(3)
0.0044318484119380075

Is this a bug or am I simply using it wrong?
It would be nice if http://docs.scipy.org/doc/scipy/reference/tutorial/stats.html contained an example or two of how to use the a, b, xa, xb, xtol parameters of scipy.stats.rv_continuous .

Christoph
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20110630/74641d70/attachment.html
```