[SciPy-User] issue estimating the multivariate Polya distribution: why?

josef.pktd@gmai... josef.pktd@gmai...
Sat Mar 3 17:51:38 CST 2012


On Sat, Mar 3, 2012 at 6:24 PM,  <josef.pktd@gmail.com> wrote:
> On Sat, Mar 3, 2012 at 4:05 PM, Emanuele Olivetti
> <emanuele@relativita.com> wrote:
>> Dear All,
>>
>> I am playing with the multivariate Polya distribution - also
>> known as the Dirichlet compound multinomial distribution. A
>> brief description from wikipedia:
>>   http://en.wikipedia.org/wiki/Multivariate_P%C3%B3lya_distribution
>>
>> I made a straightforward implementation of the probability
>> density function in the log-scale here:
>>   https://gist.github.com/1968113
>> together with a straightforward montecarlo estimation (by
>> sampling first from a Dirichlet and then computing the log-likelihood
>> of the multinomial) in the log-scale as well. The log-scale was
>> chosen in order to improve numerical stability.
>>
>> If you run the code liked above you should get these two examples:
>> ----
>> X: [ 0 50 50]
>> alpha: [ 1 10 10]
>> analytic: -5.22892710577
>> montecarlo -5.23470053651
>>
>> X: [100   0   0]
>> alpha: [ 1 10 10]
>> analytic: -51.737395965
>> montecarlo -93.5266543113
>> ----
>>
>> As you can see in the first case, i.e. X=[0,50,50], there is excellent
>> agreement between the two implementations while in the second case, i.e.
>> x=[100,0,0], there is a dramatic disagreement. Note that the montecarlo
>> estimate is quite stable and if you change the seed of the random
>> number generator you get numbers not too far from -90.
>>
>> So my question is: where does this issue come from? I cannot see
>> mistakes in the implementation (the code is very simple) and
>> I cannot see the source of numerical instability.
>
> I don't see anything. I was trying out several different parameters,
> and my only guess is that
> the logaddexp is not precise enough in this case. My results (numpy
> 1.5.1) are even worse.
>
> The probability that you want to calculate is very low
>
>>>> np.exp(-51.737395965)
> 3.3941765165211696e-23
>
> For larger values it seems to work fine, but it deteriorates fast when
> the loglikelihood drops below -15 or so (with the versions I have
> installed).

just an observation

with iterations=1e7 I get much better numbers, which are still way
off. But I don't see why this should matter much, since you are
simulating alpha and not low probability events. (unless lot's of tiny
errors add up in different ways)

Josef

>
> Do you need almost zero probability events?
>
> In a similar problem with poisson mixtures but without monte carlo, I
> was trying out various ways of rescaling, but didn't come up with
> anything useful.
>
> Josef
>
>>
>> Any hint?
>>
>> Best,
>>
>> Emanuele
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User@scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user


More information about the SciPy-User mailing list