[SciPy-user] use of rv_continuous "fit"?

Travis Oliphant oliphant at ee.byu.edu
Mon Aug 29 17:03:23 CDT 2005


lindeman at bard.edu wrote:

>Travis, sorry for the enigmatic problem formulation -- yes, I mean the latter,
>estimating the parameters of the normal distribution using data which represents
>samples drawn from a normal distribution. (Obviously, my numbers aren't actually
>drawn from a normal distribution, which may be what threw you off.)
>
>I have assumed that I would call stats.norm.fit with a vector as the first
>argument -- which returns ValueError: Not enough input arguments. So far I can't
>find any form of second argument that doesn't return ValueError: Too many input
>arguments. But even when I look at the source code, I'm still not sure what I
>should be doing -- so evidently I am missing something pretty basic.
>  
>
While .fit should work (and I'll get to fixing it if it is broken at 
some point),

the Gaussian distribution represents a well studied case.  The 
distribution is defined by its mean and variance.

I would think you should just estimate the mean and variance directly 
using the standard formulas on your data (or using stats.mean and/or 
stats.var).

In fact, I would like to overload the .fit function for many standard 
cases to use closed-form solutions to the parameter estimation instead 
of general optimization (what is currently done by default for everything).

Thus, if data contains your vector of data, then

mu = stats.mean(data)
var = stats.var(data,bias=1)   # I prefer the biased estimate because 
although it is biased
                                                  #  it has lower 
variance, and therefore, lower mean-square error.


-Travis








More information about the SciPy-user mailing list