[SciPy-user] problems with stats
danny shevitz
danny_shevitz at yahoo.com
Mon May 3 18:07:41 CDT 2004
Robert,
thanks for the response. I haven't forgotten you wrote, I just haven't
had a chance to really dig into what you wrote. Now I have.
So I have been using the frequentist approach to constructing the error
bars and I understand that. It involves taking some linear fractional
transformations of F statistics. After reading your letter, I was able
to compute the same error bounds directly. The key was the bdtri
function as you mentioned.
pLower = bdtri(n-1, N, .95)
pUpper = bdtri(n, N, .05)
for example gives the .90 confidence interval. This is the pair of
functions I needed. The n-1 in pLower is because the value n is
considered in the tail for the purposes of computing the confidences.
I didn't have a prayer in scipy.stats because bdtri is not wrapped. Its
totally different than ppf, which still requires the unknown
probability in order to construct the distribution at all.
So I can get the frequentist result directly, now I'm curious about
your Bayesian analysis. One of the uniform or improper prior gives the
same answer. BTW, I'm also curious if you left off the combinatoric
factor (n choose r) in your description. Can you give me a little of
the rational for choosing the improper prior?
thanks,
Danny
--- Robert Kern <rkern at ucsd.edu> wrote:
> danny shevitz wrote:
>
> > The short version of the problem that I was trying to solve is:
> Given N
> > trials and n events, what is the specified confidence interval for
> the
> > binomial probability p.
> >
> > The hypothetical argument call should therefore take 2 integers and
> a
> > probability, e.g. f(10, 10000000, .95) The third number is a
> > confidence, not the probability in the binomial distribution. Of
> course
> > there would have to be a pair of such functions ( one for the lower
> > bound, and one for the upper).
>
> What you are looking for is the beta distribution. Here's the
> Bayesian
> analysis:
>
> We have n Bernoulli trials and r successes with an unknown chance
> (theta) of success on any given trial. The likelihood is
>
> P(n,r|theta) = theta**r * (1-theta)**(n-r)
>
> The posterior probability is
>
> P(theta|n,r) = A * P(n,r|theta) * P(theta)
>
> A is a normalizing constant and P(theta) is your prior probability
> for
> theta. For now, let's use a minimally informative prior in order to
> make
> the fewest assumptions.
>
> P(theta) o< 1/(theta*(1-theta))
>
> The o< symbol is the proportionality symbol. This prior is improper
> and
> cannot be normalized as it stands. When we put it together with the
> likelihood, as long as r does not equal either 0 or n, the posterior
> will come out normalized. There are good reasons which I won't go
> into
> now to use this instead of P(theta)=1, but you can use that instead
> and
> redo the math that leads to the following (easy!).
>
> So, P(theta|n,r) = A * theta**(r-1) * (1-theta)**(n-r-1)
>
> We can recognize this as the Beta distribution, Beta(r,n-r) (or
> Beta(r+1,n-r+1) if you're queasy about improper priors).
>
> special.bdtri(10, 10000000, 0.95) will give you the one-sided 95%
> credible interval (like a confidence interval, only Bayesian). Since
> this distribution isn't symmetric, I'm not entirely sure how to get
> the
> highest posterior density (HPD) credible interval around the peak.
> And
> I'm not entirely sure how accurate the function is with those extreme
>
> inputs.
>
> This analysis is lifted straight from Chapter 6 of the excellent book
>
> _Probability Theory: The Logic of Science_ by E. T. Jaynes (Cambridge
>
> Press, 2003).
>
> > Like I said, what I was doing, wasn't particularly well thought
> out. I
> > was just playing around, and using the function wrong anyway...
> >
> > BTW I guess I have no idea what binom.ppf is supposed to do then.
> Why
> > do you need two probilities? Is one a confidence and the other the
> > probability.
>
> The random variable in the binomial distribution is the number of
> successes given the probability p of individual success and the
> number
> of trials. The point mass function would invert for n, the number of
> successes that, by summing the probability masses for each number of
> successes between 0 and n, gives you 0.95 (for example). So yes, one
> input is the confidence level, one is the probability of individual
> success, and you also need the number of trials.
>
> > Also in case it wasn't obvious, I'm not a statistician by training
> :-)
> >
> > D
>
> --
> Robert Kern
> rkern at ucsd.edu
>
> "In the fields of hell where the grass grows high
> Are the graves of dreams allowed to die."
> -- Richard Harter
>
> _______________________________________________
> SciPy-user mailing list
> SciPy-user at scipy.net
> http://www.scipy.net/mailman/listinfo/scipy-user
__________________________________
Do you Yahoo!?
Win a $20,000 Career Makeover at Yahoo! HotJobs
http://hotjobs.sweepstakes.yahoo.com/careermakeover
More information about the SciPy-user
mailing list