# [SciPy-user] fastest way to populate sparse matrix?

Peter Skomoroch peter.skomoroch@gmail....
Wed Dec 10 15:18:59 CST 2008

```Nathan,

Thanks for the pointer, I had missed that wiki page.

Using coo_matrix as you suggest gives over a 5x speedup:

time to fill lil_matrix: 10.4162230492
total time to fill coo_matrix: 1.82639312744

The bottleneck now seems to be this for-loop, which takes the majority
of the remaining time (1.82258105278 seconds):

for index, (i,j) in enumerate(nonzero_indices):
data[index] = dot(W[i,:],H[:,j])

Is there a better approach for this assignment block?

-Pete

New code:

#!/usr/bin/env python
# encoding: utf-8
"""
sparse_fill.py

uses latest sparse package from scipy svn
"""

import sys
import os
import time
from numpy import *
from numpy.random import random
import scipy.sparse as sparse
from scipy import io
import urllib
import tarfile

def lil_fill(V):
print "Filling lil_matrix"
n,m = V.shape
r = 200
eps=1e-9
nonzeros = [tuple(ij) for ij in transpose(V.nonzero())]
W = (random([n,r]) ).astype(float32)
H = random([r,m]).astype(float32)

print "filling..."
t0=time.time()
V_approx = sparse.lil_matrix((n,m), dtype=float32)

for i,j in nonzeros:
V_approx[i,j] = dot(W[i,:],H[:,j])
print "time to fill:", time.time() -t0

W_factor =  (V*H.T + eps ) / ( V_approx*H.T + eps)
W = W_factor*W
####################################################
print "done...\n\n"

def coo_fill(V):
print "Filling coo_matrix"
n,m = V.shape
r = 200
eps=1e-9
nonzeros = V.nonzero()
nonzero_indices = [tuple(ij) for ij in transpose(nonzeros)]
L = len(nonzeros[0])
W = (random([n,r]) ).astype(float32)
H = random([r,m]).astype(float32)

print "filling..."
t0=time.time()
row = nonzeros[0] # row indices go here
col = nonzeros[1] # column indices go here
data = zeros(L)
for index, (i,j) in enumerate(nonzero_indices):
data[index] = dot(W[i,:],H[:,j])
print "data assignment done, filling matrix", time.time() -t0
#data = ones(L) # data values go here
V_approx = sparse.coo_matrix((data,(row,col)), shape=(n,m))

print "total time to fill coo_matrix:", time.time() -t0

W_factor =  (V*H.T + eps ) / ( V_approx*H.T + eps)
W = W_factor*W
####################################################
print "done...\n\n"

def main():
# number of rows    1,916
# number of columns 1,916
# nonzeros  195,985
f = urllib.urlopen("http://www.cise.ufl.edu/research/sparse/MM/JGD_CAG/CAG_mat1916.tar.gz")

tar = tarfile.open("CAG_mat1916.tar.gz", "r:gz")
tar.extractall()

# V is a sparse matrix, with around 5 % of entries populated

lil_fill(V)
coo_fill(V)

if __name__ == '__main__':
main()

On Wed, Dec 10, 2008 at 1:46 PM, Nathan Bell <wnbell@gmail.com> wrote:
> On Wed, Dec 10, 2008 at 12:54 PM, Peter Skomoroch
> <peter.skomoroch@gmail.com> wrote:
>> What is the fastest way to replace non-zero elements of a sparse
>> matrix with corresponding elements from a product of dense matrices,
>> without the memory overhead of computing the entire dense matrix
>> product?
>>
>> The code below demonstrates the way I am doing it now: looping through
>> the nonzero elements in the sparse matrix, and forming the
>> corresponding row - column product from the dense matrices.  It uses
>> the sparse module from the latest scipy trunk.
>>
>
> The fastest way to construct a sparse matrix is using the COO format
> as discussed here:
>
> Using COO instead of LIL should be considerably faster.
>
> --
> Nathan Bell wnbell@gmail.com
> http://graphics.cs.uiuc.edu/~wnbell/
> _______________________________________________
> SciPy-user mailing list
> SciPy-user@scipy.org
> http://projects.scipy.org/mailman/listinfo/scipy-user
>

--
Peter N. Skomoroch
peter.skomoroch@gmail.com
http://www.datawrangling.com
http://del.icio.us/pskomoroch
```