Dinesh B Vadhia
dineshbvadhia@hotmail....
Sun Feb 3 13:49:10 CST 2008
Hi David
Please find some code below. There are three problems here: 1) correct method for initializing very large coo/csr matrices, 2) memory usage in initializing very large coo/csr matrices and, 3) using a function to load the coo matrix where different sized matrices are going to be used in the program.
Thank-you!
Dinesh
def populateSparseMatrix (A, nnz, dataFile, I, J)
# Populate matrix A by first loading data into a coo_matrix using coo_matrix(V, (I,J)), dims) method
ij = numpy.array(numpy.empty((nnz, 2), dtype=int))
f = open(dataFile, 'rb')
ij = pickle.load(f)
row = ij[:,0]
column = ij[:,1]
data = scipy.ones(ij.shape[0], dtype=int)
# Initialize A as coo_matrix, load data into A, convert A to csr_matrix
A = sparse.coo_matrix((data, (row, column)), dims=(I,J)).tocsr()
return A
def anotherFunctionOperatingOnSparseMatrixA(A, a, b)
blah
blah blah
blah blah blah
return a, b
# main program
# imports
import numpy
import scipy
from scipy import sparse
# constants
nnz = bigNonZeroNumber
I = bigI
J = bigJ
dataFile = aFilename
# Define and initialize all matrix and vectors
# Create and load a coo_matrix and then transform into a csr_matrix using a function (ie. def populateSparseMatrix) so that we can use program with different sized matrices
# Python requires that all parameters passed to functions be defined beforehand.
# If so, what is the correct statement to use for initializing an empty coo_matrix?
# Secondly, if I, J are very large then isn't the initialization step using up memory and hence defeating the purpose of using a coo/csr matrix?
# nnz is from the millions to the tens of millions, the sparse data is just 1's.
# For large I, J, I get 'memory error' on my 2Gb RAM machine which I shouldn't for using a coo/csr matrix
A = sparse.coo_matrix(None, dims=(I, J), dtype=int) # What is the correct initialization statement (if any)?
# Call the populate matrix A function
A = populateSparseMatrix(A, nnz, dataFile, I, J)
a, b = anotherFunctionOperatingOnSparseMatrixA(A, a, b) # assume a, b are defined before calling function
Dinesh,
What sort of method are you using to load the matrices? It'd help if
you posted some code. In general you shouldn't have to initialize
something too big in order to load in a sparse matrix. I'm not sure
that COO is terribly efficient for on-the-fly insertions. Maybe a
dok_matrix would be more appropriate, which you can then convert to
whatever you need, all at once, as then you'll know exactly how many
non-zero elements you have to allocate space for.
David
> I'm using a function to load a sparse matrix A using coo_matrix and
> then to transform it into a csr_matrix. We are testing a bunch of
> very large sized matrices A and hence the use of a function. In
> addition, A is available to many other functions in the program.
>
> Python says that A has to be defined (or initialized) before sending
> to the load function. But, doesn't that mean initializing A as
> 'empty' or 'zeroed', both of which impact memory use, defeats the
> purpose of using coo and csr? I've looked at the Sparse docstring
> help and cannot see a way out.
>
> Have I missed something?
>
> Dinesh
>
