[SciPy-user] CDF/PDF Stats with SciPy
Mon Jul 20 15:22:05 CDT 2009
The function you are mentioning, scipy.stats.norm.cdf (mean, std), will
return you the CDF for the normally distributed r.v. simply by using the
well known formula, but I typically don't know the distribution of the data
I'm processing, and I am interested at finding the discrete r.v.
distribution(s) from the actual data set. That's where histogram can be used
to represent the PDF and cumsum(histogram) or my "special" function for CDF
come into place.
I never used scipy.stats.<dist>.fit function, so I cannot help you with
The bottom line is, are you interested in:
a) determening the distribution from the actual data without bothering to
know the exact formula and drawing conclusions (that is find moments,
probabilities,etc) from it (that is what I normally do)
b) try to determine what distribution your data fits the best (i.e., is it
normal, ricean, rayleigh, nakagammi, etc)
c) you just want to play with scipy for the purpose of learning and plot
various distributions using scipy.stats.<dist>.cdf
2009/7/20 Omer Khalid <Omer.Khalid@cern.ch>
> Hi Ivo,
> Thanks for your reply. But I am getting a little confused here now. It
> seems there are multiple ways to get the CDF for a distribution. You mean
> linspace function returns a CDF for a normal distribution.
> As far as I understood from other sources is that scipy.stats.norm.cdf
> (mean, std) will return the CDF for the normal distribution or for
> non-normal distribution given one replace *norm* with the distributions
> And what about scipy.stats.<dist>.fit function?
>> Message: 7
>> Date: Mon, 20 Jul 2009 12:07:10 -0400
>> From: Ivo Maljevic <firstname.lastname@example.org>
>> Subject: Re: [SciPy-user] CDF/PDF Stats with SciPy
>> To: SciPy Users List <email@example.com>
>> Content-Type: text/plain; charset="iso-8859-1"
>> Hi Omer,
>> For histogram you can either use histogram function from numpy/scipy:
>> from scipy import *
>> x = .... # some vector
>> h,bins = histogram(x,50,normed=True,new=True) # find the histogram, number
>> of bins = 50
>> or you can use pylab's version (good for plotting):
>> import matplotlib.pyplot as plt
>> count, bins, ignored = plt.hist(x, 50, normed=True)
>> For CDF you can use cumsum function (standard approach), but for smaller
>> number of data points I
>> prefer to use all the points, which is a neat trick:
>> from scipy import *
>> def my_cdf(x):
>> bins = sort(x)
>> cdf = linspace(0,1,len(bins))
>> return [bins, cdf]
>> 2009/7/20 Omer Khalid <Omer.Khalid@cern.ch>
>> > Hi Everybody,
>> > I am new to Python and new to SciPy libraries. I wanted to take some
>> > from the experts here on the list before dive into SciPy world.
>> > I was wondering if some one could provide a rough guide about how to run
>> > two stats functions: Cumulative Distribution Function (CDF) and
>> > Distribution Function (PDF).
>> > My use case is the following: I have a sampleSpaceList  which have
>> > floating point values. When a new floating point value is generated in
>> > program, I would like to run both CDF and PDF on the sampleList for it
>> > get the probabilty of value less or equal for CDF and probablity
>> > distribution for PDF.
>> > Many thanks in advance!
>> > Omer
>> > _______________________________________________
>> > SciPy-user mailing list
>> > SciPyfirstname.lastname@example.org
>> > http://mail.scipy.org/mailman/listinfo/scipy-user
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> SciPy-user mailing list
>> End of SciPy-user Digest, Vol 71, Issue 32
> SciPy-user mailing list
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the SciPy-User