[Numpy-discussion] Extracting all the possible combinations of a grid
Gael Varoquaux
gael.varoquaux@normalesup....
Sat Sep 22 03:35:16 CDT 2007
On Fri, Sep 21, 2007 at 06:40:52PM -0600, Charles R Harris wrote:
> def triplet(n) :
> out = []
> for i in xrange(2,n) :
> for j in xrange(1,i) :
> for k in xrange(0,j) :
> out.append((i,j,k))
> return out
I need quadruplets, numbers going up first to 120 then to 200. I tried
this, but I can't even go to 120, I simply flood my system (1GB of RAM,
AMD64) by using all my RAM.
> Which isn't too bad for 161700 combinations. However, I still prefer the
> generator form if you want to save memory for large n.
> In [2]: def triplet(n) :
> ...: for i in xrange(2,n) :
> ...: for j in xrange(1,i) :
> ...: for k in xrange(1,j) :
> ...: yield i,j,k
That's the way we where doing it, but you really want to build arrays
when doing this, because the operations afterwards take ages.
Maybe I could build arrays by chunk of say 10^6. And keep itering until I
run out.
> In [29]: def combination(n,t) :
> ....: c = arange(t + 2)
> ....: c[-1] = 0
> ....: c[-2] = n
> ....: while 1 :
> ....: yield c[:t]
> ....: j = 0
> ....: while c[j] + 1 == c[j+1] :
> ....: c[j] = j
> ....: j += 1
> ....: if j >= t :
> ....: return
> ....: c[j] += 1
I didn't try that one... Wow, that one is pretty good ! 35s for
quadruplets going up to 120, 270s going up to 200s. I can use something
like itertools.groupby to build arrays by grouping the results.
I have implementations in C with weave inline that work pretty well:
* For numbers up to 120, this brute force approach is just fine:
##############################################################################
def unique_triplets(size):
""" Returns the arrays all the possible unique combinations of four
integers below size."""
C_code = """
int index = 0;
for (int j=0; j<size; j++) {
for (int k=0; k<j+1; k++) {
for (int l=0; l<k+1; l++) {
nplets(index, 0) = j;
nplets(index, 1) = k;
nplets(index, 2) = l;
index++ ;
}
}
}
"""
multiset_coef =
round(factorial(size+3-1)/(factorial(size-1)*factorial(3)))
nplets = empty((multiset_coef, 3), dtype=uint32)
inline(C_code, ['nplets', 'size'], type_converters=converters.blitz)
return nplets
368ms for numbers up to 120.
* For numbers higher, the resulting array does not fit in the memory, so
I have to break it up:
##############################################################################
def generate_fourplets(size):
""" Returns an iterator on tables listing all the possible unique
combinations of four integers below size. """
C_code = """
int index = 0;
for (int j=0; j<i+1; j++) {
for (int k=0; k<j+1; k++) {
for (int l=0; l<k+1; l++) {
nplets(index, 0) = i;
nplets(index, 1) = j;
nplets(index, 2) = k;
nplets(index, 3) = l;
index++ ;
}
}
}
"""
for i in xrange(size):
multiset_coef = round(factorial(i+3)/(factorial(i)*factorial(3)))
nplets = empty((multiset_coef, 4), dtype=uint32)
inline(C_code, ['nplets', 'i'], type_converters=converters.blitz)
yield nplets
I can't test this up to n=200 because
round(factorial(i+3)/(factorial(i)*factorial(3))) overflows. Is there a
way to calculate the binomial coef with ints only, so as to avoid the
overflows. For 200 the result is 1 373 701, by the way, so it is not
unreasonable to create an array of such size. I guess I could simply
generate an array too large, and cut it using the "index" calculated in C.
It simply looks too ugly to me.
I can implement the L algorithm in C to go quicker, and implement a with
coroutines, running the while loop 10^6 times each time, grouping the
results in an array, and saving the c[n] for next time the c code is
called. Where it gets ugly is at the end, because I have to keep track of
the size of the array I return, as I will have to crop it. This means
having a test in the Python code I fear.
I would go for the "generate_fourplets" solution if I had a way to
calculate the binomial coefficient without overflows.
Cheers,
Gaël
More information about the Numpy-discussion
mailing list