[Numpy-discussion] categorical distributions

josef.pktd@gmai... josef.pktd@gmai...
Mon Nov 22 07:46:10 CST 2010


On Mon, Nov 22, 2010 at 6:05 AM, Hagen Fürstenau <hagen@zhuliguan.net> wrote:
>> ISTM that this elementary functionality deserves an implementation
>> that's as fast as it can be.
>
> To substantiate this, I just wrote a simple implementation of
> "categorical" in "numpy/random/mtrand.pyx" and it's more than 8x faster
> than your version for multiple samples of the same distribution and more
> than 3x faster than using "multinomial(1, ...)" for multiple samples of
> different distributions (each time tested with 1000 samples drawn from
> distributions over 1000 categories).
>
> I can provide it as a patch if there's any interest.

Can you compare the speed of your cython solution with the version of Chuck

--
For instance, weight 0..3 by 1..4, then

In [14]: w = arange(1,5)

In [15]: p = cumsum(w)/float(w.sum())

In [16]: bincount(p.searchsorted(random(1000000)))/1e6
Out[16]: array([ 0.100336,  0.200382,  0.299132,  0.40015 ])

-------------
from numpy mailing list thread "Weighted random integers", sep. 10
Using searchsorted hat roughly a 10 times speedup compared to my
multinomial version

Josef


>
> - Hagen
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


More information about the NumPy-Discussion mailing list