[Numpy-discussion] non-uniform discrete sampling with given probabilities (w/ and w/o replacement)

Christopher Jordan-Squire cjordan1@uw....
Wed Aug 31 14:17:04 CDT 2011


On Wed, Aug 31, 2011 at 2:07 PM, Olivier Delalleau <shish@keba.be> wrote:
> You can use:
> 1 + numpy.argmax(numpy.random.multinomial(1, [0.1, 0.2, 0.7]))
>
> For your "real" application you'll probably want to use a value >1 for the
> first parameter (equal to your sample size), instead of calling it multiple
> times.
>
> -=- Olivier

Thanks. Warren (Weckesser) mentioned this possibility to me yesterday
and I forgot to put it in my post. I assume you mean something like

x = np.arange(3)
y = np.random.multinomial(30, [0.1,0.2,0.7])
z = np.repeat(x, y)
np.random.shuffle(z)

That look right?

-Chris JS

>
> 2011/8/31 Christopher Jordan-Squire <cjordan1@uw.edu>
>>
>> In numpy, is there a way of generating a random integer in a specified
>> range where the integers in that range have given probabilities? So,
>> for example, generating a random integer between 1 and 3 with
>> probabilities [0.1, 0.2, 0.7] for the three integers?
>>
>> I'd like to know how to do this without replacement, as well. If the
>> probabilities are uniform, there are a number of ways, including just
>> shuffling the data and taking the first however-many elements of the
>> shuffle. But this doesn't apply with non-uniform probabilities.
>> Similarly, one could try arbitrary-sampling-method X (such as
>> inverse-cdf sampling) and then rejecting repeats. But that is clearly
>> sub-optimal if the number of samples desired is near the same order of
>> magnitude as the total population, or if the probabilities are very
>> skewed. (E.g. a weighted sample of size 2 without replacement from
>> [0,1,2] with probabilities [0.999,.00005, 0.00005] will take a long
>> time if you just sample repeatedly until you have two distinct
>> samples.)
>>
>> I know parts of what I want can be done in scipy.statistics using a
>> discrete_rv or with the python standard library's random package. I
>> would much prefer to do it only using numpy because the eventual
>> application shouldn't have a scipy dependency and should use the same
>> random seed as numpy.random.
>>
>> (For more background, what I want is to create a function like sample
>> in R, where I can give it an array-like of doo-hickeys and another
>> array-like of probabilities associated with each doo-hickey, and then
>> generate a random sample of doo-hickeys with those probabilities. One
>> step for that is generating ints, to use as indices, with the same
>> probabilities. I'd like a version of this to be in numpy/scipy, but it
>> doesn't really belong in scipy since it doesn't
>>
>> -Chris JS
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


More information about the NumPy-Discussion mailing list