[SciPy-User] Unit testing of Bayesian estimator

Anne Archibald peridot.faceted@gmail....
Mon Nov 9 12:06:15 CST 2009


2009/11/9 Bruce Southey <bsouthey@gmail.com>:

> I do not know what you are trying to do with the code as it is not my
> area. But you are using some empirical Bayesian estimator
> (http://en.wikipedia.org/wiki/Empirical_Bayes_method) and thus you lose
> much of the value of Bayesian as you are only dealing with modal
> estimates. Really you should be obtaining the distribution of
> "Probability the signal is pulsed" not just the modal estimate.

Um. Given a data set and a prior, I just do Bayesian hypothesis
comparison. This gives me a single probability that the signal is
pulsed. You seem to be imagining a probability distribution for this
probability - but what would the independent variables be? The
unpulsed distribution does not depend on any parameters, and I have
integrated over all possible values for the pulsed distribution. So
what I get should really be the probability, given the data, that the
signal is pulsed. I'm not using an empirical Bayesian estimator; I'm
doing the numerical integrations directly (and inefficiently).

>> This doesn't really test whether the estimator is doing a good job,
>> since if I throw mountains of information at it, even a rather badly
>> wrong implementation will eventually converge to the right answer.
>> (This is painful experience speaking.)
>>
> Are you testing the code or the method?
> My understanding of unit tests is that they test the code not the
> method. Unit tests tell me that my code is working correctly but do not
> necessary tell me if the method is right always. For example, if I need
> to iterate to get a solution, my test could stop after 1 or 2 rounds
> before convergence because I know that rest will be correct if the first
> rounds are correct.

Unit tests can be used to do either. Since what I'm trying to do here
is make sure I understand Bayesian inference, I'm most worried about
the algorithm.

> Testing the algorithm is relatively easy because you just have to use
> sensitivity analysis. Basically just use multiple data sets that vary in
> the number of observations and parameters to see how well these work.
> The hard part is making sense of the numbers.

It is exactly how to make sense of the numbers that I'm asking about.

> Also note that you have some explicit assumptions involved like the type
> of prior distribution. These tend to limit what you can do because if
> these assume a uniform prior then you can not use a non-uniform data
> set. Well you can but unless the data dominates the prior you will most
> likely get a weird answer.

I don't understand what you mean by a "non-uniform data set".
Individual data sets are drawn from models, one of which is uniform.
The priors define the distribution of models; the priors I use give a
50% chance the model is uniform and a 50% chance the model is pulsed.

Anne


More information about the SciPy-User mailing list