[SciPy-user] Initializing COO/CSR matrix before function
Dinesh B Vadhia
Sun Feb 3 13:49:10 CST 2008
Please find some code below. There are three problems here: 1) correct method for initializing very large coo/csr matrices, 2) memory usage in initializing very large coo/csr matrices and, 3) using a function to load the coo matrix where different sized matrices are going to be used in the program.
def populateSparseMatrix (A, nnz, dataFile, I, J)
# Populate matrix A by first loading data into a coo_matrix using coo_matrix(V, (I,J)), dims) method
ij = numpy.array(numpy.empty((nnz, 2), dtype=int))
f = open(dataFile, 'rb')
ij = pickle.load(f)
row = ij[:,0]
column = ij[:,1]
data = scipy.ones(ij.shape, dtype=int)
# Initialize A as coo_matrix, load data into A, convert A to csr_matrix
A = sparse.coo_matrix((data, (row, column)), dims=(I,J)).tocsr()
def anotherFunctionOperatingOnSparseMatrixA(A, a, b)
blah blah blah
return a, b
# main program
from scipy import sparse
nnz = bigNonZeroNumber
I = bigI
J = bigJ
dataFile = aFilename
# Define and initialize all matrix and vectors
# Create and load a coo_matrix and then transform into a csr_matrix using a function (ie. def populateSparseMatrix) so that we can use program with different sized matrices
# Python requires that all parameters passed to functions be defined beforehand.
# If so, what is the correct statement to use for initializing an empty coo_matrix?
# Secondly, if I, J are very large then isn't the initialization step using up memory and hence defeating the purpose of using a coo/csr matrix?
# nnz is from the millions to the tens of millions, the sparse data is just 1's.
# For large I, J, I get 'memory error' on my 2Gb RAM machine which I shouldn't for using a coo/csr matrix
A = sparse.coo_matrix(None, dims=(I, J), dtype=int) # What is the correct initialization statement (if any)?
# Call the populate matrix A function
A = populateSparseMatrix(A, nnz, dataFile, I, J)
a, b = anotherFunctionOperatingOnSparseMatrixA(A, a, b) # assume a, b are defined before calling function
Date: Sun, 3 Feb 2008 00:20:50 -0500
From: David Warde-Farley <email@example.com>
Subject: Re: [SciPy-user] Initializing COO/CSR matrix before function
To: SciPy Users List <firstname.lastname@example.org>
Content-Type: text/plain; charset="us-ascii"
What sort of method are you using to load the matrices? It'd help if
you posted some code. In general you shouldn't have to initialize
something too big in order to load in a sparse matrix. I'm not sure
that COO is terribly efficient for on-the-fly insertions. Maybe a
dok_matrix would be more appropriate, which you can then convert to
whatever you need, all at once, as then you'll know exactly how many
non-zero elements you have to allocate space for.
On 2-Feb-08, at 11:16 PM, Dinesh B Vadhia wrote:
> I'm using a function to load a sparse matrix A using coo_matrix and
> then to transform it into a csr_matrix. We are testing a bunch of
> very large sized matrices A and hence the use of a function. In
> addition, A is available to many other functions in the program.
> Python says that A has to be defined (or initialized) before sending
> to the load function. But, doesn't that mean initializing A as
> 'empty' or 'zeroed', both of which impact memory use, defeats the
> purpose of using coo and csr? I've looked at the Sparse docstring
> help and cannot see a way out.
> Have I missed something?
> SciPy-user mailing list
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the SciPy-user