# [Numpy-discussion] Generating random samples without repeats

Rick White rlw@stsci....
Fri Sep 19 08:00:36 CDT 2008

```Paul Moore <pf_moore <at> yahoo.co.uk> writes:

> Robert Kern <robert.kern <at> gmail.com> writes:
>> On Thu, Sep 18, 2008 at 16:55, Paul Moore <pf_moore <at>
>> yahoo.co.uk> wrote:
>>> I want to generate a series of random samples, to do simulations
>>> based
>>> on them. Essentially, I want to be able to produce a SAMPLESIZE * N
>>> matrix, where each row of N values consists of either
>>>
>>> 1. Integers between 1 and M (simulating M rolls of an N-sided
>>> die), or
>>> 2. A sample of N numbers between 1 and M without repeats (simulating
>>>    deals of N cards from an M-card deck).
>>>
>>> Example (1) is easy, numpy.random.random_integers(1, M,
>>> (SAMPLESIZE, N))
>>>
>>> But I can't find an obvious equivalent for (2). Am I missing
>>> something
>>> glaringly obvious? I'm using numpy - is there maybe something in
>>> scipy I
>>> should be looking at?
>>
>> numpy.array([(numpy.random.permutation(M) + 1)[:N]
>>     for i in range(SAMPLESIZE)])
>>
>
> Thanks.
>
> And yet, this takes over 70s and peaks at around 400M memory use,
> whereas the
> equivalent for (1)
>
> numpy.random.random_integers(1,M,(SAMPLESIZE,N))
>
> takes less than half a second, and negligible working memory (both
> end up
> allocating an array of the same size, but your suggestion consumes
> temporary
> working memory - I suspect, but can't prove, that the time taken
> comes from
> memory allocations rather than computation.

It seems like numpy.random.permutation is pretty suboptimal in its
speed.  Here's a Python 1-liner that does the same thing (I think)
but is a lot faster:

a = 1+numpy.random.rand(M).argsort()[0:N-1]

This still has the the problem that it generates a size N array to
start with.  But at least it is fast compared with permutation:

>>> t1 = timeit.Timer("a = 1+numpy.random.permutation(M)[0:N-1]",
"import numpy; M=1000; N=100")
>>> t1.repeat(3,1000)
[1.3792998790740967, 1.3387739658355713, 1.3446521759033203]
>>> t2 = timeit.Timer("a = 1+numpy.random.rand(M).argsort()[0:N-1]",
"import numpy; M=1000; N=100")
>>> t2.repeat(3,1000)
[0.17597699165344238, 0.16013598442077637, 0.16066503524780273]

```

More information about the Numpy-discussion mailing list