# [SciPy-User] [Numpy-discussion] Fitting a curve on a log-normal distributed data

josef.pktd@gmai... josef.pktd@gmai...
Tue Nov 17 12:38:01 CST 2009

```On Tue, Nov 17, 2009 at 12:29 PM, Gökhan Sever <gokhansever@gmail.com> wrote:
>
>
> On Tue, Nov 17, 2009 at 12:13 AM, Ian Mallett <geometrian@gmail.com> wrote:
>>
>> Theory wise:
>> -Do a linear regression on your data.
>> -Apply a logrithmic transform to your data's dependent variable, and do
>> another linear regression.
>> -Apply a logrithmic transform to your data's independent variable, and do
>> another linear regression.
>> -Take the best regression (highest r^2 value) and execute a back
>> transform.
>>
>> Then, to get your desired extrapolation, simply substitute in the size for
>> the independent variable to get the expected value.
>>
>> If, however, you're looking for how to implement this in NumPy or SciPy, I
>> can't really help :-P
>> Ian
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
> OK, before applying your suggestions. I have a few more questions. Here is 1
> real-sample data that I will use as a part of the log-normal fitting. There
> is 15 elements in this arrays each being a concentration for corresponding
> 0.1 - 3.0 um size ranges.
>
> I[74]: conc
> O[74]:
> array([ 119.7681,  118.546 ,  146.6548,   96.5478,  109.9911,   32.9974,
>          20.7762,    6.1107,   12.2212,    3.6664,    3.6664,    1.2221,
>           2.4443,    2.4443,    3.6664])
>
> For now not calibrated size range I just assume a linear array:
>
> I[78]: sizes = linspace(0.1, 3.0, 15)
>
> I[79]: sizes
> O[79]:
> array([ 0.1       ,  0.30714286,  0.51428571,  0.72142857,  0.92857143,
>         1.13571429,  1.34285714,  1.55      ,  1.75714286,  1.96428571,
>         2.17142857,  2.37857143,  2.58571429,  2.79285714,  3.        ])
>
>
> Not a very ideal looking log-normal, but so far I don't know what else
> besides a log-normal fit would give me a better estimate:
> I[80]: figure(); plot(sizes, conc)
> http://img406.imageshack.us/img406/156/sizeconc.png
>
> scipy.stats has the lognorm.fit
>
>     lognorm.fit(data,s,loc=0,scale=1)
>         - Parameter estimates for lognorm data
>
> and applying this to my data. However not sure the right way of calling it,
> and not sure if this could be applied to my case?
>
> I[81]: stats.lognorm.fit(conc)
> O[81]: array([ 2.31386066,  1.19126064,  9.5748391 ])
>
> Lastly, what is the way to create a ideal log-normal sample using the
> stats.lognorm.rvs?

I don't think I understand the connection to the log-normal distribution.
You seem to have a non-linear relationship
conc = f(size)  where you want to find a non-linear relationship f

If conc where just lognormal distributed, then you would not get any
relationship between conc and size.

If you have many observations with conc, size pairs then you could
estimate a noisy model
conc = f(size) + u  where the noise u is for example log-normal
distributed but you would still need to get an expression for the
non-linear function f.
Extending a non-linear function outside of the observed range
is essentially always just a guess or by assumption.

If you want to fit a curve f that has the same shape as the pdf of
the log-normal, then you cannot do it with lognorm.fit, because that
just assumes you have a random sample independent of size.

So, it's not clear to me what you really want, or what your sample data
looks like (do you have only one 15 element sample or lots of them).

Josef

>
> Thanks
>
>
> --
> Gökhan
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
```