[SciPy-user] CDF/PDF Stats with SciPy
Mon Jul 20 15:51:21 CDT 2009
I am not sure I quite understand what you are doing (the first criterion is
the success of an experiment, and the second criterion is based on
statistics of the first test?), but regardless of what you are doing, you
can apply my_cdf() function I gave you to get the discrete CDF (or you can
get the cumulative CDF as 1-CDF). To elaborate a little more on why I prefer
this CDF approach. Quite often, r.v.'s have long tails, and they tend to
disapper when you do numerical integration (cumsum is the most basic
approach) on the estimated pdf (which is the histogram). When you use all
the available data points (instead of just 50 or so), you get much better
Once you find the CDF, you should be able the get your probabilities
directly by reading off the plot values or by finding which Y-axis value
(which is the probability) matches whatever bin you are interested in (on
I don't know if this helps. If not, and if you have some real data, maybe I
can write you some more code.
2009/7/20 Omer Khalid <Omer.Khalid@cern.ch>
> Hi Ivo,
>> The bottom line is, are you interested in:
>> a) determining the distribution from the actual data without bothering to
>> know the exact formula and drawing conclusions (that is find moments,
>> probabilities,etc) from it (that is what I normally do)
> Yes, I am interested in this.
>> b) try to determine what distribution your data fits the best (i.e., is it
>> normal, ricean, rayleigh, nakagammi, etc)
> This is partially true..
> I think I should have explained more of my research question. My program is
> generating a real number variate X for every success. I keep on storing X
> for each success cycle of my program and once the sample list is size 1000;
> then I would like to use that sample space to determine the probability for
> every next X and again store it until the sample space reaches 1000.
> I am not really concerned with the distribution type of my sample space, so
> i thought (may be out of ignorance) that I first must determine the
> distribution type using the fit function and then get the mean/std. Once I
> have mean/std, then i get CDF probability for every next X and store it my
> sample list replace the previous once.
> Basically, I want to get a probability for every X in my program cycle till
> the next sample space reaches 1000, and keeps on doing it. This way I am
> assuming my algorithm will learn to improve.
> But I could not figure out the proper python code yet for this....
> SciPy-user mailing list
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the SciPy-User