[SciPy-user] MemoryError transforming COO matrix to a CSR matrix

Nathan Bell wnbell@gmail....
Thu Feb 7 10:51:53 CST 2008

```On Feb 7, 2008 3:26 AM, Dinesh B Vadhia <dineshbvadhia@hotmail.com> wrote:
>
>
> I get a MemoryError when transforming a coo_matrix to a csr_matrix.  The
> coo_matrix is loaded with about 32m integers (in fact, just binary 1's)
> which at 4 bytes per int works out to about 122Mb for the matirix.  As I
> have 2Gb of RAM on my Windows machine this should be ample for transforming
> A to a csr_matrix.  Here is the error message followed by the code:
>

Actually, it's (slightly more than) 32m*( 4 + 8) = 384Mb because SciPy
is upcasting your ints to doubles.  The dev version supports smaller
dtypes, which would lower it to (slightly more than) 32m*( 4 + 1 ) =
160Mb.

Your COO matrix takes 32m*(4 + 4 + 8) = 512Mb

The ij array takes 32m*2*(4) = 256Mb (the COO matrix can't use row =
ij[:,0] and column = ij[:,1] directly, because those arrays are not
contiguous)

# imports
import numpy
import scipy
from scipy import sparse

# constants
nnz = 31398038
I = 20000
J = 80000
dataFile = aFilename

# Initialize A as a coo_matrix with dimensions(I, J)
# this does nothing A = sparse.coo_matrix(None, dims=(I, J), dtype=int)

# Populate matrix A by first loading data into a coo_matrix using the
coo_matrix(V, (I,J)), dims) method
# this does nothing  ij = numpy.array(numpy.empty((nnz, 2), dtype=int))
> f = open(dataFile, 'rb')
> ij = pickle.load(f)
> row = numpy.ascontiguousarray(ij[:,0],dtype='intc')
> column = numpy.ascontiguousarray(ij[:,1],dtype='intc')
> del ij
> data = scipy.ones(ij.shape[0], dtype='float32')

# Load data into A, convert A to csr_matrix
> A = sparse.csr_matrix((data, (row, column)), dims=(I,J))  # implicit COO->CSR conversion

If this doesn't work then you either need to make ij[:,0] and ij[:,1]
contiguous or use a developers version of SciPy which supports smaller
data types like 'int8'.

--
Nathan Bell wnbell@gmail.com
http://graphics.cs.uiuc.edu/~wnbell/
```