[SciPy-User] issue estimating the multivariate Polya distribution: why?

Emanuele Olivetti emanuele@relativita....
Sat Mar 3 15:05:14 CST 2012


Dear All,

I am playing with the multivariate Polya distribution - also
known as the Dirichlet compound multinomial distribution. A
brief description from wikipedia:
   http://en.wikipedia.org/wiki/Multivariate_P%C3%B3lya_distribution

I made a straightforward implementation of the probability
density function in the log-scale here:
   https://gist.github.com/1968113
together with a straightforward montecarlo estimation (by
sampling first from a Dirichlet and then computing the log-likelihood
of the multinomial) in the log-scale as well. The log-scale was
chosen in order to improve numerical stability.

If you run the code liked above you should get these two examples:
----
X: [ 0 50 50]
alpha: [ 1 10 10]
analytic: -5.22892710577
montecarlo -5.23470053651

X: [100   0   0]
alpha: [ 1 10 10]
analytic: -51.737395965
montecarlo -93.5266543113
----

As you can see in the first case, i.e. X=[0,50,50], there is excellent
agreement between the two implementations while in the second case, i.e.
x=[100,0,0], there is a dramatic disagreement. Note that the montecarlo
estimate is quite stable and if you change the seed of the random
number generator you get numbers not too far from -90.

So my question is: where does this issue come from? I cannot see
mistakes in the implementation (the code is very simple) and
I cannot see the source of numerical instability.

Any hint?

Best,

Emanuele



More information about the SciPy-User mailing list