[SciPy-user] Initializing COO/CSR matrix before function

Dinesh B Vadhia dineshbvadhia@hotmail....
Sun Feb 3 13:49:10 CST 2008

Hi David

Please find some code below.  There are three problems here: 1) correct method for initializing very large coo/csr matrices, 2) memory usage in initializing very large coo/csr matrices and, 3) using a function to load the coo matrix where different sized matrices are going to be used in the program.



def populateSparseMatrix (A, nnz, dataFile, I, J)
    # Populate matrix A by first loading data into a coo_matrix using coo_matrix(V, (I,J)), dims) method

    ij = numpy.array(numpy.empty((nnz, 2), dtype=int))
    f = open(dataFile, 'rb')
    ij = pickle.load(f)

    row = ij[:,0]
    column = ij[:,1]
    data = scipy.ones(ij.shape[0], dtype=int)

    # Initialize A as coo_matrix, load data into A, convert A to csr_matrix
    A = sparse.coo_matrix((data, (row, column)), dims=(I,J)).tocsr()

    return A

def anotherFunctionOperatingOnSparseMatrixA(A, a, b)
    blah blah
    blah blah blah

    return a, b

# main program

# imports
import numpy
import scipy
from scipy import sparse 

# constants
nnz = bigNonZeroNumber
I = bigI
J = bigJ
dataFile = aFilename

# Define and initialize all matrix and vectors
# Create and load a coo_matrix and then transform into a csr_matrix using a function (ie. def populateSparseMatrix) so that we can use program with different sized matrices

# Python requires that all parameters passed to functions be defined beforehand.  
# If so, what is the correct statement to use for initializing an empty coo_matrix?  
# Secondly, if I, J are very large then isn't the initialization step using up memory and hence defeating the purpose of using a coo/csr matrix?
# nnz is from the millions to the tens of millions, the sparse data is just 1's.
# For large I, J, I get 'memory error' on my 2Gb RAM machine which I shouldn't for using a coo/csr matrix

A = sparse.coo_matrix(None, dims=(I, J), dtype=int)       # What is the correct initialization statement (if any)?

# Call the populate matrix A function
A = populateSparseMatrix(A, nnz, dataFile, I, J)

a, b = anotherFunctionOperatingOnSparseMatrixA(A, a, b)        # assume a, b are defined before calling function


Message: 5
Date: Sun, 3 Feb 2008 00:20:50 -0500
From: David Warde-Farley <dwf@cs.toronto.edu>
Subject: Re: [SciPy-user] Initializing COO/CSR matrix before function
To: SciPy Users List <scipy-user@scipy.org>
Message-ID: <10B25358-9941-4613-8E65-D94F203514A6@cs.toronto.edu>
Content-Type: text/plain; charset="us-ascii"


What sort of method are you using to load the matrices? It'd help if  
you posted some code. In general you shouldn't have to initialize  
something too big in order to load in a sparse matrix. I'm not sure  
that COO is terribly efficient for on-the-fly insertions. Maybe a  
dok_matrix would be more appropriate, which you can then convert to  
whatever you need, all at once, as then you'll know exactly how many  
non-zero elements you have to allocate space for.


On 2-Feb-08, at 11:16 PM, Dinesh B Vadhia wrote:

> I'm using a function to load a sparse matrix A using coo_matrix and  
> then to transform it into a csr_matrix.  We are testing a bunch of  
> very large sized matrices A and hence the use of a function.  In  
> addition, A is available to many other functions in the program.
> Python says that A has to be defined (or initialized) before sending  
> to the load function.  But, doesn't that mean initializing A as  
> 'empty' or 'zeroed', both of which impact memory use, defeats the  
> purpose of using coo and csr?  I've looked at the Sparse docstring  
> help and cannot see a way out.
> Have I missed something?
> Dinesh
> _______________________________________________
> SciPy-user mailing list
> SciPy-user@scipy.org
> http://projects.scipy.org/mailman/listinfo/scipy-user

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://projects.scipy.org/pipermail/scipy-user/attachments/20080203/9b6ed400/attachment.html 

More information about the SciPy-user mailing list