[Numpy-discussion] Generating random samples without repeats
Rick White
rlw@stsci....
Fri Sep 19 08:00:36 CDT 2008
Paul Moore <pf_moore <at> yahoo.co.uk> writes:
> Robert Kern <robert.kern <at> gmail.com> writes:
>> On Thu, Sep 18, 2008 at 16:55, Paul Moore <pf_moore <at>
>> yahoo.co.uk> wrote:
>>> I want to generate a series of random samples, to do simulations
>>> based
>>> on them. Essentially, I want to be able to produce a SAMPLESIZE * N
>>> matrix, where each row of N values consists of either
>>>
>>> 1. Integers between 1 and M (simulating M rolls of an N-sided
>>> die), or
>>> 2. A sample of N numbers between 1 and M without repeats (simulating
>>> deals of N cards from an M-card deck).
>>>
>>> Example (1) is easy, numpy.random.random_integers(1, M,
>>> (SAMPLESIZE, N))
>>>
>>> But I can't find an obvious equivalent for (2). Am I missing
>>> something
>>> glaringly obvious? I'm using numpy - is there maybe something in
>>> scipy I
>>> should be looking at?
>>
>> numpy.array([(numpy.random.permutation(M) + 1)[:N]
>> for i in range(SAMPLESIZE)])
>>
>
> Thanks.
>
> And yet, this takes over 70s and peaks at around 400M memory use,
> whereas the
> equivalent for (1)
>
> numpy.random.random_integers(1,M,(SAMPLESIZE,N))
>
> takes less than half a second, and negligible working memory (both
> end up
> allocating an array of the same size, but your suggestion consumes
> temporary
> working memory - I suspect, but can't prove, that the time taken
> comes from
> memory allocations rather than computation.
It seems like numpy.random.permutation is pretty suboptimal in its
speed. Here's a Python 1-liner that does the same thing (I think)
but is a lot faster:
a = 1+numpy.random.rand(M).argsort()[0:N-1]
This still has the the problem that it generates a size N array to
start with. But at least it is fast compared with permutation:
>>> t1 = timeit.Timer("a = 1+numpy.random.permutation(M)[0:N-1]",
"import numpy; M=1000; N=100")
>>> t1.repeat(3,1000)
[1.3792998790740967, 1.3387739658355713, 1.3446521759033203]
>>> t2 = timeit.Timer("a = 1+numpy.random.rand(M).argsort()[0:N-1]",
"import numpy; M=1000; N=100")
>>> t2.repeat(3,1000)
[0.17597699165344238, 0.16013598442077637, 0.16066503524780273]
More information about the Numpy-discussion
mailing list