[SciPy-User] vectorized version of 'multinomial' sampling function

josef.pktd@gmai... josef.pktd@gmai...
Wed Oct 14 00:41:22 CDT 2009


On Wed, Oct 14, 2009 at 1:07 AM,  <josef.pktd@gmail.com> wrote:
> On Tue, Oct 13, 2009 at 5:01 PM, per freem <perfreem@gmail.com> wrote:
>> hi all,
>>
>> i have a series of probability vector that i'd like to feed into
>> multinomial to get an array of vector outcomes back. for example,
>> given:
>>
>> p = array([[ 0.9 ,  0.05,  0.05],
>>       [ 0.05,  0.05,  0.9 ]])
>>
>> i'd like to call multinomial like this:
>>
>> multinomial(1, p)
>
> If you only want n=1, then it seems to me, you could do this also with
> drawing an integer out of range(3) with the given probabilities. This
> can be vectorized easily directly with numpy. (just like an individual
> observation in multinomial logit)
>
> Josef
>
>>
>> to get a vector of multinomial samplers, each using the nth list in
>> 'p'. something like:
>>
>> array([[1, 0, 0], [0, 0 1]]) in this case. is this possible? it seems
>> like 'multinomial' takes only a one dimensional array. i could write
>> this as a "for" loop of course but i prefer a vectorized version since
>> speed is crucial for me here.


for n=1 the following should work (unless I still cannot tell
multinomial distribution and multinomial logit apart)

Josef

>>> p = np.array([[ 0.9 ,  0.05,  0.05],
       [ 0.05,  0.05,  0.9 ]])
>>> pcum = p.cumsum(1)
>>> pcuma = np.repeat(pcum,20, 0)
>>> rvs = np.random.uniform(size=(40))
>>> a,b,c = pcuma.T
>>> mnrvs = np.column_stack((rvs<=a, (rvs>a) & (rvs<=b), rvs>b)).astype(int)
>>> mnrvs[:20].sum()
20
>>> mnrvs[:20].sum(0)
array([19,  0,  1])
>>> mnrvs[20:].sum()
20
>>> mnrvs[20:].sum(0)
array([ 1,  0, 19])
>>> mnrvs[15:25]
array([[0, 0, 1],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [0, 0, 1],
       [0, 0, 1],
       [0, 0, 1],
       [1, 0, 0],
       [0, 0, 1]])


check for larger sample:

>>> pcuma = np.repeat(pcum,500, 0)
>>> rvs = np.random.uniform(size=(1000))
>>> a,b,c = pcuma.T
>>> mnrvs = np.column_stack((rvs<=a, (rvs>a) & (rvs<=b), rvs>b)).astype(int)
>>> mnrvs[:500].sum(0)
array([456,  17,  27])
>>> mnrvs[500:].sum(0)
array([ 24,  35, 441])
>>> p*500
array([[ 450.,   25.,   25.],
       [  25.,   25.,  450.]])


More information about the SciPy-User mailing list