[Numpy-discussion] Generating random samples without repeats
Paul Moore
pf_moore@yahoo.co...
Fri Sep 19 04:08:20 CDT 2008
Robert Kern <robert.kern <at> gmail.com> writes:
> On Thu, Sep 18, 2008 at 16:55, Paul Moore <pf_moore <at> yahoo.co.uk> wrote:
> > I want to generate a series of random samples, to do simulations based
> > on them. Essentially, I want to be able to produce a SAMPLESIZE * N
> > matrix, where each row of N values consists of either
> >
> > 1. Integers between 1 and M (simulating M rolls of an N-sided die), or
> > 2. A sample of N numbers between 1 and M without repeats (simulating
> > deals of N cards from an M-card deck).
> >
> > Example (1) is easy, numpy.random.random_integers(1, M, (SAMPLESIZE, N))
> >
> > But I can't find an obvious equivalent for (2). Am I missing something
> > glaringly obvious? I'm using numpy - is there maybe something in scipy I
> > should be looking at?
>
> numpy.array([(numpy.random.permutation(M) + 1)[:N]
> for i in range(SAMPLESIZE)])
>
Thanks.
And yet, this takes over 70s and peaks at around 400M memory use, whereas the
equivalent for (1)
numpy.random.random_integers(1,M,(SAMPLESIZE,N))
takes less than half a second, and negligible working memory (both end up
allocating an array of the same size, but your suggestion consumes temporary
working memory - I suspect, but can't prove, that the time taken comes from
memory allocations rather than computation.
As a one-off cost initialising my data, it's not a disaster, but I anticipate
using idioms like this later in my calculations as well, where the costs could
hurt more.
If I'm going to need to write C code, are there any good examples of this? (I
guess the source for numpy.random is a good place to start).
Paul
More information about the Numpy-discussion
mailing list