[SciPy-user] use of rv_continuous "fit"?
Travis Oliphant
oliphant at ee.byu.edu
Mon Aug 29 17:03:23 CDT 2005
lindeman at bard.edu wrote:
>Travis, sorry for the enigmatic problem formulation -- yes, I mean the latter,
>estimating the parameters of the normal distribution using data which represents
>samples drawn from a normal distribution. (Obviously, my numbers aren't actually
>drawn from a normal distribution, which may be what threw you off.)
>
>I have assumed that I would call stats.norm.fit with a vector as the first
>argument -- which returns ValueError: Not enough input arguments. So far I can't
>find any form of second argument that doesn't return ValueError: Too many input
>arguments. But even when I look at the source code, I'm still not sure what I
>should be doing -- so evidently I am missing something pretty basic.
>
>
While .fit should work (and I'll get to fixing it if it is broken at
some point),
the Gaussian distribution represents a well studied case. The
distribution is defined by its mean and variance.
I would think you should just estimate the mean and variance directly
using the standard formulas on your data (or using stats.mean and/or
stats.var).
In fact, I would like to overload the .fit function for many standard
cases to use closed-form solutions to the parameter estimation instead
of general optimization (what is currently done by default for everything).
Thus, if data contains your vector of data, then
mu = stats.mean(data)
var = stats.var(data,bias=1) # I prefer the biased estimate because
although it is biased
# it has lower
variance, and therefore, lower mean-square error.
-Travis
More information about the SciPy-user
mailing list