[SciPy-User] [Numpy-discussion] Fitting a curve on a log-normal distributed data

Robert Kern robert.kern@gmail....
Tue Nov 17 16:58:43 CST 2009

On Tue, Nov 17, 2009 at 16:42, Gökhan Sever <gokhansever@gmail.com> wrote:
> On Tue, Nov 17, 2009 at 4:27 PM, Robert Kern <robert.kern@gmail.com> wrote:
>> On Tue, Nov 17, 2009 at 16:21, Gökhan Sever <gokhansever@gmail.com> wrote:
>> > Besides, what is wrong with using the spline interpolation technique? It
>> > fits nicely on my sample data. See the resulting image here:
>> > http://img197.imageshack.us/img197/9638/sizeconcsplinefit.png    (Green
>> > line
>> > represents the fit spline)
>> What spline interpolation technique?
> From here
> http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html
> Spline interpolation in 1-d (interpolate.splXXX)
>> That certainly doesn't look like
>> a good spline fit.
> True, because I used only 30 points. It looks much smoother with alot more
> point as you might expected.

Don't judge it based on its smoothness at many points. The smooth
appearance is simply a function of the number of points you choose to
sample it with, not how well it fits the data.

Even if you weren't dealing with an extrapolation problem, you
shouldn't use spline interpolation* on noisy data. You would do
something like least-squares fitting to a low-order spline. The spline
should not go through the observed data points exactly.

* And this brings up another terminological issue. I may have used the
term "interpolation" in a couple of different ways. There is a general
sense in which "interpolate" means "to make predictions about certain
inputs (e.g. the concentration [prediction] for the given particle
size [input]) within the range of observed inputs". Whereas,
"interpolate" can also mean something much more specific: finding a
curve that exactly goes through the given observations. "Spline
interpolation" would be a form of the latter, and is not related to
what you need.

>> In any case, splines may be fine for
>> *interpolation*, but you need *extrapolation*, and splines are useless
>> for that.
>> You need a physically-motivated model like the distributions
>> recommended by your textbook.
> Using spline-interp is a test case to see how good it will do on my data.

Good. I just wanted to make sure that you knew what was wrong with
using splines in this case. :-)

> I
> will use log-normal way as was in the original intention. Let me check with
> someone else in the department to get some feedback on this before I
> completely get lost in the matter.

Always wise. :-)

> One quick question: "extrapolation" means to estimate a data both "beyond"
> and "below" the given limits, right? (For my example to guess less than
> 0.1um should I say downward-extrapolation and above 3.0 um
> upward-extrapolation or just extrapolation is enough?)

Just "extrapolation" can describe either case, yes.

Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco

More information about the SciPy-User mailing list