[SciPy-User] vectorized version of 'multinomial' sampling function
per freem
perfreem@gmail....
Wed Oct 14 22:05:34 CDT 2009
<josef.pktd@gmail.com> wrote:
>>> If you only want n=1, then it seems to me, you could do this also with
>>> drawing an integer out of range(3) with the given probabilities. This
>>> can be vectorized easily directly with numpy. (just like an individual
>>> observation in multinomial logit)
>>>
>>> Josef
>>>
>>>>
>>>> to get a vector of multinomial samplers, each using the nth list in
>>>> 'p'. something like:
>>>>
>>>> array([[1, 0, 0], [0, 0 1]]) in this case. is this possible? it seems
>>>> like 'multinomial' takes only a one dimensional array. i could write
>>>> this as a "for" loop of course but i prefer a vectorized version since
>>>> speed is crucial for me here.
>>
>>
>> for n=1 the following should work (unless I still cannot tell
>> multinomial distribution and multinomial logit apart)
>>
>> Josef
>>
>>>>> p = np.array([[ 0.9 , 0.05, 0.05],
>> [ 0.05, 0.05, 0.9 ]])
>>>>> pcum = p.cumsum(1)
>>>>> pcuma = np.repeat(pcum,20, 0)
>>>>> rvs = np.random.uniform(size=(40))
>>>>> a,b,c = pcuma.T
>>>>> mnrvs = np.column_stack((rvs<=a, (rvs>a) & (rvs<=b), rvs>b)).astype(int)
>>>>> mnrvs[:20].sum()
>> 20
>>>>> mnrvs[:20].sum(0)
>> array([19, 0, 1])
>>>>> mnrvs[20:].sum()
>> 20
>>>>> mnrvs[20:].sum(0)
>> array([ 1, 0, 19])
>>>>> mnrvs[15:25]
>> array([[0, 0, 1],
>> [1, 0, 0],
>> [1, 0, 0],
>> [1, 0, 0],
>> [1, 0, 0],
>> [0, 0, 1],
>> [0, 0, 1],
>> [0, 0, 1],
>> [1, 0, 0],
>> [0, 0, 1]])
>>
>>
>> check for larger sample:
>>
>>>>> pcuma = np.repeat(pcum,500, 0)
>>>>> rvs = np.random.uniform(size=(1000))
>>>>> a,b,c = pcuma.T
>>>>> mnrvs = np.column_stack((rvs<=a, (rvs>a) & (rvs<=b), rvs>b)).astype(int)
>>>>> mnrvs[:500].sum(0)
>> array([456, 17, 27])
>>>>> mnrvs[500:].sum(0)
>> array([ 24, 35, 441])
>>>>> p*500
>> array([[ 450., 25., 25.],
>> [ 25., 25., 450.]])
>>
>
> similar works for e.g. n=5
>
> Josef
hi Josef,
thanks very much for your reply. i see how this works for the specific
case where the multinomial vector of probabilities has 3 numbers in
it, for which you defined the corresponding variables "a", "b" and
"c". but is there a way to rewrite this code to make it work for an
arbitrarily sized vector of probabilities? for example, i'm not sure
how one would generalize this line:
mnrvs = np.column_stack((rvs<=a, (rvs>a) & (rvs<=b), rvs>b)).astype(int)
any ideas on how to do this would be greatly appreciated. thanks again.
More information about the SciPy-User
mailing list