Hi David

Please find some code below.  There are three problems here: 1) correct method for initializing very large coo/csr matrices, 2) memory usage in initializing very large coo/csr matrices and, 3) using a function to load the coo matrix where different sized matrices are going to be used in the program.



def populateSparseMatrix (A, nnz, dataFile, I, J)
    # Populate matrix A by first loading data into a coo_matrix using coo_matrix(V, (I,J)), dims) method

    ij = numpy.array(numpy.empty((nnz, 2), dtype=int))
    f = open(dataFile, 'rb')
    ij = pickle.load(f)

    row = ij[:,0]
    column = ij[:,1]
    data = scipy.ones(ij.shape[0], dtype=int)

    # Initialize A as coo_matrix, load data into A, convert A to csr_matrix
    A = sparse.coo_matrix((data, (row, column)), dims=(I,J)).tocsr()

    return A

def anotherFunctionOperatingOnSparseMatrixA(A, a, b)
    blah blah
    blah blah blah

    return a, b

# main program

# imports
import numpy
import scipy
from scipy import sparse 

# constants
nnz = bigNonZeroNumber
I = bigI
J = bigJ
dataFile = aFilename

# Define and initialize all matrix and vectors
# Create and load a coo_matrix and then transform into a csr_matrix using a function (ie. def populateSparseMatrix) so that we can use program with different sized matrices

# Python requires that all parameters passed to functions be defined beforehand.  
# If so, what is the correct statement to use for initializing an empty coo_matrix?  
# Secondly, if I, J are very large then isn't the initialization step using up memory and hence defeating the purpose of using a coo/csr matrix?
# nnz is from the millions to the tens of millions, the sparse data is just 1's.
# For large I, J, I get 'memory error' on my 2Gb RAM machine which I shouldn't for using a coo/csr matrix

A = sparse.coo_matrix(None, dims=(I, J), dtype=int)       # What is the correct initialization statement (if any)?

# Call the populate matrix A function
A = populateSparseMatrix(A, nnz, dataFile, I, J)

a, b = anotherFunctionOperatingOnSparseMatrixA(A, a, b)        # assume a, b are defined before calling function


What sort of method are you using to load the matrices? It'd help if  
you posted some code. In general you shouldn't have to initialize  
something too big in order to load in a sparse matrix. I'm not sure  
that COO is terribly efficient for on-the-fly insertions. Maybe a  
dok_matrix would be more appropriate, which you can then convert to  
whatever you need, all at once, as then you'll know exactly how many  
non-zero elements you have to allocate space for.


