[SciPy-User] Efficiently adding a vector to every row of a sparse CSR matrix?
Wed Mar 6 19:00:19 CST 2013
As part of implementing a batch calculation of Jensen-Shannon divergence, I
need to take a (sparse) 65536-element vector "V" and add it to every row of
a (sparse) 500000x65536 matrix "O" of observations. Is there any way to do
this that is both space and time efficient? The usual O+V tries to convert
O to a dense matrix, which fails because O is too big to fit in memory (it
would take up ~120 GB!).
I also can't do it slowly via iteration, because it looks like it's not
possible to update a sparse matrix in place.
My current solution is to tile V into a new 500000x65536 matrix and then
import numpy as np
import sparse as sp
V = sp.csr_matrix(V)
# Create the CSR matrix directly
Vindptr = np.arange(0, len(V.indices)*O.shape+1, len(V.indices),
Vindices = np.tile(V.indices, O.shape)
Vdata = np.tile(V.data, O.shape)
mV = sp.csr_matrix((Vdata, Vindices, Vindptr), shape=O.shape)
result = O+mV
This is reasonably fast (though creating mV takes around 6 seconds on its
own), but takes up a lot of memory to store even though there's a ton of
Is there any way to do this efficiently? It seems like there ought to be...
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the SciPy-User