[SciPy-User] Efficiently adding a vector to every row of a sparse CSR matrix?
Brendan Dolan-Gavitt
mooyix@gmail....
Wed Mar 6 19:00:19 CST 2013
Hi,
As part of implementing a batch calculation of Jensen-Shannon divergence, I
need to take a (sparse) 65536-element vector "V" and add it to every row of
a (sparse) 500000x65536 matrix "O" of observations. Is there any way to do
this that is both space and time efficient? The usual O+V tries to convert
O to a dense matrix, which fails because O is too big to fit in memory (it
would take up ~120 GB!).
I also can't do it slowly via iteration, because it looks like it's not
possible to update a sparse matrix in place.
My current solution is to tile V into a new 500000x65536 matrix and then
add:
import numpy as np
import sparse as sp
[...]
V = sp.csr_matrix(V)
# Create the CSR matrix directly
Vindptr = np.arange(0, len(V.indices)*O.shape[0]+1, len(V.indices),
dtype=np.int32)
Vindices = np.tile(V.indices, O.shape[0])
Vdata = np.tile(V.data, O.shape[0])
mV = sp.csr_matrix((Vdata, Vindices, Vindptr), shape=O.shape)
result = O+mV
This is reasonably fast (though creating mV takes around 6 seconds on its
own), but takes up a lot of memory to store even though there's a ton of
duplicate data.
Is there any way to do this efficiently? It seems like there ought to be...
-Brendan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20130306/689ff3a8/attachment.html
More information about the SciPy-User
mailing list