[SciPy-User] Reading / writing sparse matrices

Matthew Brett matthew.brett@gmail....
Thu Nov 11 22:58:57 CST 2010


Hi,

On Thu, Nov 11, 2010 at 7:36 PM, Lutz Maibaum <lutz.maibaum@gmail.com> wrote:
> On Nov 11, 2010, at 5:27 PM, Matthew Brett wrote:
>> On Fri, Jun 18, 2010 at 6:31 PM, Lutz Maibaum <lutz.maibaum@gmail.com> wrote:
>>> How can I write a sparse matrix with elements of type uint64 to a file, and recover it while preserving the data type? For example:
>>>
>>>>>> import numpy as np
>>>>>> import scipy.sparse
>>>>>> a=scipy.sparse.lil_matrix((5,5), dtype=np.uint64)
>>>>>> a[0,0]=9876543210
>>>
>>> Now I save this matrix to a file:
>>>
>>>>>> import scipy.io
>>>>>> scipy.io.mmwrite("test.mtx", a, field='integer')
>>>
>>> If I do not specify the field argument of mmwrite, I get a "unexpected dtype of kind u" exception. The generated file test.mtx looks as expected. But when I try to read this matrix, it is converted to int32:
>>>
>>>>>> b=scipy.io.mmread("test.mtx")
>>>>>> b.dtype
>>> dtype('int32')
>>>>>> b.data
>>> array([-2147483648], dtype=int32)
>>>
>>> As far as I can tell, it is not possible to specify a dtype when calling mmread. Is there a better way to go about this?
>>
>> I had a quick look at the code, and then at the Matrix Market format,
>> and it looks to me:
>>
>> http://math.nist.gov/MatrixMarket/reports/MMformat.ps.gz
>>
>> as if Matrix Market only allows integer, real or complex - hence the
>> (somewhat unhelpful) error.
>
> Yes, the Matrix Market file format has only these 3 types, and scipy.io.mmwrite (actually, scipy.io.mmio.MMFile._write) has to guess which of these to use for a given dtype:
...
> It would be nice if this algorithm would be extended to handle unsigned integers (which seem to have kind=='u', but I'm not sure if that's sufficient and necessary) as well, which could also translate to "integer" in the MM file.

> The opposite problem occurs when the file is read by mmread, which has to figure out how to translate the three Matrix Market types to python's numeric types. Using the system's default types for int, float and complex is very reasonable, but it would be nice if one could override this default by specifying an optional dtype argument (as is used, for example, by numpy.loadtxt).

The problem I can see is that this would be confusing:

a=scipy.sparse.lil_matrix((5,5), dtype=np.uint64)
a[0,0]=9876543210
mmwrite(fname, a)
res = mmread(fname)
b.data
array([-2147483648], dtype=int32)

That is, I think the writer shouldn't write something without warning,
that it will read incorrectly by default.   So, how about a
compromise:

In [7]: mmwrite(fname, a)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
...
TypeError: Will not write unsigned integers by default. Please pass
field="integer" to write unsigned integers
In [8]: mmwrite(fname, a, field='integer')
In [9]: res = mmread(fname, dtype=np.uint64)
In [11]: res.todense()[0,0]
Out[11]: 9876543210

?

Best,

Matthew


More information about the SciPy-User mailing list