[SciPy-User] [SciPy-user] Problem with np.load() on Huge Sparse Matrix
Ryan R. Rosario
uclamathguy@gmail....
Fri Jun 4 11:30:35 CDT 2010
Oh. Yes, mymatrix and intersection_matrix are the same. I forgot to change
the name. The number of nonzero elements is 1.2 billion.
What is weird is that if I use np.load(...,'r') (memmap), it seems to read
the file fine. But, if I do not use memmap, the data is corrupt.
R.
Robert Elsner wrote:
> Hello,
> how sparse is your matrix (NNZ)? From your code it is not clear that
> mymatrix and intersection_matrix are actually the same matrices.
> Am Donnerstag, den 03.06.2010, 23:39 -0700 schrieb Ryan R. Rosario:
>> Is this a bug? Has anybody else experienced this?
>> Not being able to load a matrix from disk is a huge limitation for me. I
>> would appreciate any help anyone can provide with this.
>>
>> Thanks,
>> Ryan
>> Ryan R. Rosario wrote:
>> > Hi,
>> >
>> > I have a very huge sparse (395000 x 395000) CSC matrix that I cannot
>> > save in one pass, so I saved the data, indices, indptr and shape in
>> > separate files as suggested by Dave Wade-Farley a few years back.
>> >
>> > When I try to read back the indices pickle:
>> >
>> >>> np.save("indices.pickle", mymatrix.indices)
>> >>>> indices = np.load("indices.pickle.npy")
>> >>>> indices
>> > array([394852, 394649, 394533, ..., 0, 0, 0],
>> dtype=int32)
>> >>>> intersection_matrix.indices
>> > array([394852, 394649, 394533, ..., 1557, 1223, 285],
>> dtype=int32)
>> > Why is this happening? My only workaround is to print all of entries
>> > of intersection_matrix.indices to a file, and read in back which takes
>> > up to 2 hours. It would be great if I could get np.load to work
>> > because it is much faster.
>> >
>> > Thanks,
>> > Ryan
>> >
