[SciPy-User] efficient way to store and use a 4D redundant matrix
Wed Mar 16 14:31:44 CDT 2011
I don't know of any scipy class that will do this for you; when last I
checked, scipy didn't even include a good class for
upper/lower-triangular matrix storage (which would cut your space in
You could save some space with an index matrix - a 100x100x100x100
array of 32-bit ints indexing another array of 64-bit floats; that
cuts your storage by half, but you'd have to implement linear algebra
The sparse matrix formats will only help you if you can rewrite A in
terms of matrices that are mostly 0.
Do you need the results of slicing, reshaping, etc. to also be
similarly compressed? If so, I can't see any way to implement this
without an index array, because once you reshape or slice A you won't
know which cells correspond to which indices in the original A.
If you're only taking small slices of this and then applying linear
algebra operations to those, you might be better off writing a class
that looks up the relevant values on the fly; you could overload
__getitem__ so that e.g. A[:,1,:,3] would generate the correct float64
array on the fly and return it.
However, if the nonredundant part takes only ~4MB, then maybe I don't
understand your layout - for a 100x100x100x100 and 64-bit floats, I
think the nonredundant part should take ((100 choose 4) + ((100 choose
3) * 3) + ((100 choose 2) * 3) + (100 choose 1)) * 8 bytes = about
34MB. Was that a math error, or am I misunderstanding the question?
On Wed, Mar 16, 2011 at 11:41 AM, Emanuele Olivetti
> Hi Everybody,
> I have a 4D matrix "A" where entries differing only by a permutation of
> indices are equal, i.e. A[1,2,3,4] = A[3,4,2,1]. Since each set of
> of 4 indices I have then 24 permutations, my goal would be to avoid
> storing 24 times the necessary space. I am looking at scipy.sparse but
> I have no clear understanding whether it could handle my case.
> Any idea?
> For sake of clarity here is a simple code computing a matrix A like
> the one above. I usually have A of shape (100,100,100,100) which requires
> ~800Mb when the dtype is double. Of course the non-redundant part is
> just ~4Mb so you may understand my interest in this issue.
> import numpy as np
> from itertools import permutations
> form itertools import combinations_with_replacement
> except ImportError: # if Python < v2.7 that function is not available...
> form itertools import product
> def combinations_with_replacement(iterable, r):
> pool = tuple(iterable)
> n = len(pool)
> for indices in product(range(n), repeat=r):
> if sorted(indices) == list(indices):
> yield tuple(pool[i] for i in indices)
> rows = 10
> columns = 20
> x = np.random.rand(rows,columns)+1.0
> A = np.zeros((rows, rows, rows, rows))
> indices_rows = range(rows)
> for i,j,k,l in combinations_with_replacement(indices_rows, 4):
> tmp = (x[i,:]*x[j,:]*x[k,:]*x[l,:]).sum()
> for a,b,c,d in permutations([i,j,k,l]):
> A[a,b,c,d] = tmp
> In case you wonder which kind of operations do I need to do on A,
> they are usual manypulations (slicing, reshaping, etc.), and common
> linear algebra.
> SciPy-User mailing list
More information about the SciPy-User