[SciPy-user] fastest way to populate sparse matrix?

Peter Skomoroch peter.skomoroch@gmail....
Wed Dec 10 15:18:59 CST 2008


Nathan,

Thanks for the pointer, I had missed that wiki page.

Using coo_matrix as you suggest gives over a 5x speedup:

time to fill lil_matrix: 10.4162230492
total time to fill coo_matrix: 1.82639312744

The bottleneck now seems to be this for-loop, which takes the majority
of the remaining time (1.82258105278 seconds):

    for index, (i,j) in enumerate(nonzero_indices):
        data[index] = dot(W[i,:],H[:,j])

Is there a better approach for this assignment block?

-Pete

New code:

#!/usr/bin/env python
# encoding: utf-8
"""
sparse_fill.py

uses latest sparse package from scipy svn
"""

import sys
import os
import time
from numpy import *
from numpy.random import random
import scipy.sparse as sparse
from scipy import io
import urllib
import tarfile


def lil_fill(V):
    print "Filling lil_matrix"
    n,m = V.shape
    r = 200
    eps=1e-9
    nonzeros = [tuple(ij) for ij in transpose(V.nonzero())]
    W = (random([n,r]) ).astype(float32)
    H = random([r,m]).astype(float32)

    print "filling..."
    t0=time.time()
    V_approx = sparse.lil_matrix((n,m), dtype=float32)

    for i,j in nonzeros:
        V_approx[i,j] = dot(W[i,:],H[:,j])
    print "time to fill:", time.time() -t0

    W_factor =  (V*H.T + eps ) / ( V_approx*H.T + eps)
    W = W_factor*W
    ####################################################
    print "done...\n\n"


def coo_fill(V):
    print "Filling coo_matrix"
    n,m = V.shape
    r = 200
    eps=1e-9
    nonzeros = V.nonzero()
    nonzero_indices = [tuple(ij) for ij in transpose(nonzeros)]
    L = len(nonzeros[0])
    W = (random([n,r]) ).astype(float32)
    H = random([r,m]).astype(float32)

    print "filling..."
    t0=time.time()
    row = nonzeros[0] # row indices go here
    col = nonzeros[1] # column indices go here
    data = zeros(L)
    for index, (i,j) in enumerate(nonzero_indices):
        data[index] = dot(W[i,:],H[:,j])
    print "data assignment done, filling matrix", time.time() -t0
    #data = ones(L) # data values go here
    V_approx = sparse.coo_matrix((data,(row,col)), shape=(n,m))

    print "total time to fill coo_matrix:", time.time() -t0

    W_factor =  (V*H.T + eps ) / ( V_approx*H.T + eps)
    W = W_factor*W
    ####################################################
    print "done...\n\n"


def main():
    # number of rows    1,916
    # number of columns 1,916
    # nonzeros  195,985
    print "loading matrix..."
    f = urllib.urlopen("http://www.cise.ufl.edu/research/sparse/MM/JGD_CAG/CAG_mat1916.tar.gz")
    tar_download = open('CAG_mat1916.tar.gz','w')
    print >> tar_download, f.read()
    tar_download.close()

    tar = tarfile.open("CAG_mat1916.tar.gz", "r:gz")
    tar.extractall()

    # V is a sparse matrix, with around 5 % of entries populated
    V = io.mmread("CAG_mat1916/CAG_mat1916.mtx").tocsr()

    lil_fill(V)
    coo_fill(V)

if __name__ == '__main__':
	main()




On Wed, Dec 10, 2008 at 1:46 PM, Nathan Bell <wnbell@gmail.com> wrote:
> On Wed, Dec 10, 2008 at 12:54 PM, Peter Skomoroch
> <peter.skomoroch@gmail.com> wrote:
>> What is the fastest way to replace non-zero elements of a sparse
>> matrix with corresponding elements from a product of dense matrices,
>> without the memory overhead of computing the entire dense matrix
>> product?
>>
>> The code below demonstrates the way I am doing it now: looping through
>> the nonzero elements in the sparse matrix, and forming the
>> corresponding row - column product from the dense matrices.  It uses
>> the sparse module from the latest scipy trunk.
>>
>
> The fastest way to construct a sparse matrix is using the COO format
> as discussed here:
> http://www.scipy.org/SciPyPackages/Sparse#head-be8a0be5d0e44c4d59550d64fb0173508073c36e
>
> Using COO instead of LIL should be considerably faster.
>
> --
> Nathan Bell wnbell@gmail.com
> http://graphics.cs.uiuc.edu/~wnbell/
> _______________________________________________
> SciPy-user mailing list
> SciPy-user@scipy.org
> http://projects.scipy.org/mailman/listinfo/scipy-user
>



-- 
Peter N. Skomoroch
peter.skomoroch@gmail.com
http://www.datawrangling.com
http://del.icio.us/pskomoroch


More information about the SciPy-user mailing list