[SciPy-User] issue estimating the multivariate Polya distribution: why?
Emanuele Olivetti
emanuele@relativita....
Sat Mar 3 15:05:14 CST 2012
Dear All,
I am playing with the multivariate Polya distribution - also
known as the Dirichlet compound multinomial distribution. A
brief description from wikipedia:
http://en.wikipedia.org/wiki/Multivariate_P%C3%B3lya_distribution
I made a straightforward implementation of the probability
density function in the log-scale here:
https://gist.github.com/1968113
together with a straightforward montecarlo estimation (by
sampling first from a Dirichlet and then computing the log-likelihood
of the multinomial) in the log-scale as well. The log-scale was
chosen in order to improve numerical stability.
If you run the code liked above you should get these two examples:
----
X: [ 0 50 50]
alpha: [ 1 10 10]
analytic: -5.22892710577
montecarlo -5.23470053651
X: [100 0 0]
alpha: [ 1 10 10]
analytic: -51.737395965
montecarlo -93.5266543113
----
As you can see in the first case, i.e. X=[0,50,50], there is excellent
agreement between the two implementations while in the second case, i.e.
x=[100,0,0], there is a dramatic disagreement. Note that the montecarlo
estimate is quite stable and if you change the seed of the random
number generator you get numbers not too far from -90.
So my question is: where does this issue come from? I cannot see
mistakes in the implementation (the code is very simple) and
I cannot see the source of numerical instability.
Any hint?
Best,
Emanuele
More information about the SciPy-User
mailing list