[SciPy-User] Unit testing of Bayesian estimator
Sun Nov 8 01:14:37 CST 2009
> On Fri, Nov 6, 2009 at 7:13 PM, Anne Archibald
> <firstname.lastname@example.org> wrote:
>> I have implemented a simple Bayesian regression program (it takes
>> events modulo one and returns a posterior probability that the data is
>> phase-invariant plus a posterior distribution for two parameters
>> (modulation fraction and phase) in case there is modulation). I'm
>> rather new at this, so I'd like to construct some unit tests. Does
>> anyone have any suggestions on how to go about this?
>> For a frequentist periodicity detector, the return value is a
>> probability that, given the null hypothesis is true, the statistic
>> would be this extreme. So I can construct a strong unit test by
>> generating a collection of data sets given the null hypothesis,
>> evaluating the statistic, and seeing whether the number that claim to
>> be significant at a 5% level is really 5%. (In fact I can use the
>> binomial distribution to get limits on the number of false positive.)
>> This gives me a unit test that is completely orthogonal to my
>> implementation, and that passes if and only if the code works. For a
>> Bayesian hypothesis testing setup, I don't really see how to do
>> something analogous.
>> I can generate non-modulated data sets and confirm that my code
>> returns a high probability that the data is not modulated, but how
>> high should I expect the probability to be? I can generate data sets
>> with models with known parameters and check that the best-fit
>> parameters are close to the known parameters - but how close? Even if
>> I do it many times, is the posterior mean unbiased? What about the
>> posterior mode or median? I can even generate models and then data
>> sets that are drawn from the prior distribution, but what should I
>> expect from the code output on such a data set? I feel sure there's
>> some test that verifies a statistical property of Bayesian
>> estimators/hypothesis testers, but I cant quite put my finger on it.
> The Bayesian experts are at pymc, maybe you can look at there tests
> for inspiration. I don't know those, since I never looked at that
> I never tried to test a Bayesian estimator but many properties are
> still the same as in the non-Bayesian analysis. In my Bayesian past, I
> essentially only used normal and t distributions, and binomial.
> One of my first tests for these things is to create a huge sample and
> see whether the parameter estimates converge. With Bayesian analysis
> you still have the law of large numbers, (for non-dogmatic priors)
As far as getting the code roughly working, this is what I used; just
run it generating lots of photons and see that roughly the right
parameters come out. Unfortunately, this isn't really very sensitive
to all the things that are supposed to make a Bayesian estimator
better than (say) a maximum-likelihood estimator; I could have the
probability estimation pretty badly wrong, but there are so many
photons that anything but the right parameters are such a horrible fit
even a somewhat wrong algorithm won't select them.
Maybe you meant something different: I could also try fixing some
model parameters and generating just a handful of photons, so I get a
crummy estimate, but then repeating the photon generation and fit many
times, to see if the average value of the best-fit parameters comes
out close to the true parameters. But this is a test for unbiasedness
of the estimator, and it's not clear that this estimator should be
unbiased even if correct.
> Do you have an example with a known posterior? Then, the posterior
> with a large sample or the average in a Monte Carlo should still be
> approximately the true one.
> For symmetric distributions, the Bayesian posterior confidence
> intervals and posterior mean should be roughly the same as the
> frequentist estimates. With diffuse priors, in many cases the results
> are exactly the same in Bayesian and MLE.
> Another version I used in the past is to trace the posterior mean, as
> the prior variance is reduced, in one extreme you should get the prior
> back in the other extreme the MLE.
My priors are flat, on (0,1) in both phase and pulsed fraction. It
seems a bit peculiar to use anything else in phase, but I can imagine
some sort of logarithmic prior for pulsed fraction (making 0.01-0.1
equally likely to 0.1-1). I haven't experimented with introducing
localized priors, but it seems like that too wouldn't be very
sensitive to whether the Bayesian calculation is right; if the prior
insists that the values are both 0.5, then any remotely sane algorithm
will come up with posteriors that are also 0.5.
>> I can even generate models and then data
>> sets that are drawn from the prior distribution, but what should I
>> expect from the code output on such a data set?
> If you ignore the Bayesian interpretation, then this is just a
> standard sampling problem, you draw prior parameters and observations,
> the rest is just finding the conditional and marginal probabilities. I
> think the posterior odds ratio should converge in a large Monte Carlo
> to the true one, and the significance levels should correspond to the
> one that has been set for the test (5%).
> (In simplest case of conjugate priors, you can just interpret the
> prior as a previous sample and you are back to a frequentist
This sounds like what I was trying for - draw a model according to the
priors, then generate a data set according to the model. I then get
some numbers out: the simplest is a probability that the model was
pulsed, but I can also get a credible interval or an estimated CDF for
the model parameters. But I'm trying to figure out what test I should
apply to those values to see if they make sense.
For a credible interval, I suppose I could take (say) a 95% credible
interval, then 95 times out of a hundred the model parameters I used
to generate the data set should be in the credible interval. And I
should be able to use the binomial distribution to put limits on how
close to 95% I should get in M trials. This seems to work, but I'm not
sure I understand why. The credible region is obtained from a
probability distribution for the model parameters, but I am turning
things around and testing the distribution of credible regions.
In any case, that seems to work, so now I just need to figure out a
similar test for the probability of being pulsed.
> The problem is that with an informative prior, you always have a
> biased estimator in small samples and the posterior odds ratio is
> affected by an informative prior. And "real" Bayesians don't care
> about sampling properties.
> What are your prior distributions and the likelihood function in your
> case? Can you model degenerate and diffuse priors, so that an
> informative prior doesn't influence you sampling results?
> I'm trying to think of special cases where you could remove the effect
> of the prior.
I've put the code on github in case that helps make this any clearer:
The model I'm using is either: completely uniform mod 1 (probability
0.5) or: the PDF is a cosine plus a constant, where the two parameters
are the fraction of area under the cosine (as opposed to under the
constant) and the phase offset of the cosine. The likelihood is just
(modulo logs for range issues) the product over all observed phases x
of PDF(fraction, phase, x). So the mode of the posterior is exactly
the maximum-likelihood estimate (whether or not I got the math right,
more or less).
> It's a bit vague because I don't see the details, and I haven't looked
> at this in a while.
As is probably obvious, I'm pretty vague on Bayesian statistics in
general. But I'm working on it.
>> Suggestions welcome.
>> SciPy-User mailing list
> SciPy-User mailing list
More information about the SciPy-User