[Numpy-discussion] Generating random samples without repeats

Paul Moore pf_moore@yahoo.co...
Fri Sep 19 05:17:51 CDT 2008

Anne Archibald <peridot.faceted <at> gmail.com> writes:

> This was discussed on one of the mailing lists several months ago. It
> turns out that there is no simple way to efficiently choose without
> replacement in numpy/scipy.

That reassures me that I'm not missing something obvious! I'm pretty new with 
numpy (I've lurked here for a number of years, but never had a real-life need 
to use numpy until now).

> I posted a hack that does this somewhat
> efficiently (if SAMPLESIZE>M/2, choose the first SAMPLESIZE of a
> permutation; if SAMPLESIZE<M/2, choose with replacement and redraw any
> duplicates) but it's not vectorized across many sample sets. Is your
> problem large M or large N? what is SAMPLESIZE/M?

It's actually large SAMPLESIZE. As an example, I'm simulating repeated deals 
of poker hands from a deck of cards: M=52, N=5, SAMPLESIZE=1000000.

For now, Robert's approach will work, but it will start blowing up when I want 
100 million samples - I don't have the memory to hold all the data (4 bytes 
for an int * N=5 * 100000000 = 2GB plus change). So I'll need to allocate 
(say) 1 million at a time in a loop and accumulate my results. That's when 70-
second costs to allocate start to hurt. (After all, this is just the setup - 
I've got my actual calculations to do as well!!!)

I'll stick with Robert's approach for now, and see if I can knock up something 
using Cython once I really need the speed.


More information about the Numpy-discussion mailing list