[SciPy-dev] Possible fix for scipy.sparse.lil_matrix column-slicing problem
Robert Cimrman
cimrman3@ntc.zcu...
Tue Dec 1 07:11:51 CST 2009
Hi Tim,
Tim Victor wrote:
> On Mon, Nov 30, 2009 at 5:32 AM, Robert Cimrman wrote:
>> Is there are reason why you removed the special case of x being a scalar, namely:
>>
>> - elif issequence(i) and issequence(j):
>> - if np.isscalar(x):
>> - for ii, jj in zip(i, j):
>> - self._insertat(ii, jj, x)
>> - else:
>> - for ii, jj, xx in zip(i, j, x):
>> - self._insertat(ii, jj, xx)
>>
>> This removal broke a code of mine, which now takes forever, and behaves in a
>> different way. Try this:
>>
>> In [1]: import scipy.sparse as spp
>> In [2]: a = spp.lil_matrix((1000, 1000))
>> In [3]: a
>> Out[3]:
>> <1000x1000 sparse matrix of type '<type 'numpy.float64'>'
>> with 0 stored elements in LInked List format>
>> In [4]: import numpy as np
>> In [5]: ir = ic = np.arange(1000)
>> In [6]: a[ir, ic] = 1
>>
>> The result is a matrix with all the entries set to 1 (= full!), not just the
>> diagonal, which was the previous (IMHO good) behaviour. In the real code I do
>> not set the diagonal, but some other elements given by two lists ir, ic, but
>> the code above shows the symptoms.
>>
>> I can fix easily my code by not using the LIL matrix:
>>
>> In [15]: a = spp.coo_matrix((np.ones((ir.shape[0],)), (ir, ic)))
>> In [16]: a
>> Out[16]:
>> <1000x1000 sparse matrix of type '<type 'numpy.float64'>'
>> with 1000 stored elements in COOrdinate format>
>>
>> but I wonder, if the above change in behaviour was intended...
>>
>> cheers,
>> r.
> ---------------------
>
> Robert,
>
>> Is there are reason why you removed the special case of x being a scalar, namely:
>
> Actually I don't think it was the special case of x being a scalar,
> but rather the special case of the row index and the column index both
> being sequences.
Yes, x being scalar is in a sense a subset of the case when i, j, x are three
sequences of the same length.
> Let me know if this looks wrong to you, but using NumPy dense arrays
> and matrices as a reference for correct behavior, I get:
>
> m = sp.zeros((10,10))
>
> # set first column of m to ones:
> m[range(10),0] = 1
>
> # set diagonal of m to -1:
> m[range(10),range(10)] = -1
> m
>
> (prints:)
> array([[-1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
> [ 1., -1., 0., 0., 0., 0., 0., 0., 0., 0.],
> [ 1., 0., -1., 0., 0., 0., 0., 0., 0., 0.],
> [ 1., 0., 0., -1., 0., 0., 0., 0., 0., 0.],
> [ 1., 0., 0., 0., -1., 0., 0., 0., 0., 0.],
> [ 1., 0., 0., 0., 0., -1., 0., 0., 0., 0.],
> [ 1., 0., 0., 0., 0., 0., -1., 0., 0., 0.],
> [ 1., 0., 0., 0., 0., 0., 0., -1., 0., 0.],
> [ 1., 0., 0., 0., 0., 0., 0., 0., -1., 0.],
> [ 1., 0., 0., 0., 0., 0., 0., 0., 0., -1.]])
>
> The first case, assigning a scalar with a range for the row index and
> a scalar for the column index, lil_matrix still works the way it did.
> It iterates the range of rows and sets column 0 for each. The second
> case, iterating both of the ranges and using the row/column indexes in
> pairs, was inadvertently changed.
>
> A third case suggests that it isn't only about assigning a scalar,
> since assigning from a sequence using pairwise-matched sequences of
> row and column indices should work similarly:
>
> # assign random values to the reverse diagonal:
> m[range(10),range(9,-1,-1)] = sp.rand(10)
> m
>
> (prints:)
> array([[-1. , 0. , 0. , 0. , 0. ,
> 0. , 0. , 0. , 0. , 0.90216781],
> [ 1. , -1. , 0. , 0. , 0. ,
> 0. , 0. , 0. , 0.54728632, 0. ],
> [ 1. , 0. , -1. , 0. , 0. ,
> 0. , 0. , 0.97993962, 0. , 0. ],
> [ 1. , 0. , 0. , -1. , 0. ,
> 0. , 0.66932184, 0. , 0. , 0. ],
> [ 1. , 0. , 0. , 0. , -1. ,
> 0.64246538, 0. , 0. , 0. , 0. ],
> [ 1. , 0. , 0. , 0. , 0.93092643,
> -1. , 0. , 0. , 0. , 0. ],
> [ 1. , 0. , 0. , 0.25711642, 0. ,
> 0. , -1. , 0. , 0. , 0. ],
> [ 1. , 0. , 0.28826595, 0. , 0. ,
> 0. , 0. , -1. , 0. , 0. ],
> [ 1. , 0.01331807, 0. , 0. , 0. ,
> 0. , 0. , 0. , -1. , 0. ],
> [ 0.86963499, 0. , 0. , 0. , 0. ,
> 0. , 0. , 0. , 0. , -1. ]])
>
> Yes, I broke that. :-) I haven't used this behavior or seen it
> documented and it wasn't covered by a unit test, and I missed it. (In
> my defense, lil_matrix is broken even worse for that third case in
> both SciPy versions 0.7.0 and 0.7.1.)
>
> Robert, can you (or anyone else out there) tell me if this covers it
> all, or whether there is some more general array-indexing behavior
> that should be implemented? I'll be happy to put in a case to handle
> it.
I think scipy.sparse indexing should follow the behavior of numpy dense arrays.
This is what current SVN scipy does (0.8.0.dev6122):
In [1]: import scipy.sparse as spp
In [2]: a = spp.lil_matrix((10,10))
In [3]: a[range(10),0] = 1
This is ok.
In [5]: a[range(10),range(10)] = -1
In [8]: print a.todense()
------> print(a.todense())
[[-1. -1. -1. -1. -1. -1. -1. -1. -1. -1.]
[-1. -1. -1. -1. -1. -1. -1. -1. -1. -1.]
[-1. -1. -1. -1. -1. -1. -1. -1. -1. -1.]
[-1. -1. -1. -1. -1. -1. -1. -1. -1. -1.]
[-1. -1. -1. -1. -1. -1. -1. -1. -1. -1.]
[-1. -1. -1. -1. -1. -1. -1. -1. -1. -1.]
[-1. -1. -1. -1. -1. -1. -1. -1. -1. -1.]
[-1. -1. -1. -1. -1. -1. -1. -1. -1. -1.]
[-1. -1. -1. -1. -1. -1. -1. -1. -1. -1.]
[-1. -1. -1. -1. -1. -1. -1. -1. -1. -1.]]
This is IMHO not ok (what other sparse matrix users think?)
In [9]: import scipy as sp
In [10]: a[range(10),range(9,-1,-1)] = sp.rand(10)
In [12]: print a.todense()
-------> print(a.todense())
[[ 0.71485802 0.95410746 0.07350778 0.77786453 0.28376647 0.35679679
0.34837324 0.48012982 0.87469439 0.98490982]
[ 0.71485802 0.95410746 0.07350778 0.77786453 0.28376647 0.35679679
0.34837324 0.48012982 0.87469439 0.98490982]
[ 0.71485802 0.95410746 0.07350778 0.77786453 0.28376647 0.35679679
0.34837324 0.48012982 0.87469439 0.98490982]
[ 0.71485802 0.95410746 0.07350778 0.77786453 0.28376647 0.35679679
0.34837324 0.48012982 0.87469439 0.98490982]
[ 0.71485802 0.95410746 0.07350778 0.77786453 0.28376647 0.35679679
0.34837324 0.48012982 0.87469439 0.98490982]
[ 0.71485802 0.95410746 0.07350778 0.77786453 0.28376647 0.35679679
0.34837324 0.48012982 0.87469439 0.98490982]
[ 0.71485802 0.95410746 0.07350778 0.77786453 0.28376647 0.35679679
0.34837324 0.48012982 0.87469439 0.98490982]
[ 0.71485802 0.95410746 0.07350778 0.77786453 0.28376647 0.35679679
0.34837324 0.48012982 0.87469439 0.98490982]
[ 0.71485802 0.95410746 0.07350778 0.77786453 0.28376647 0.35679679
0.34837324 0.48012982 0.87469439 0.98490982]
[ 0.71485802 0.95410746 0.07350778 0.77786453 0.28376647 0.35679679
0.34837324 0.48012982 0.87469439 0.98490982]]
same as above...
> I'm reminded of the Zen of Python "Explicit is better than implicit." guideline.
:)
Consider also this: the current behavior (broadcasting the index arrays to the
whole rectangle) is not compatible with NumPy, and does not allow setting
elements e.g. in a random sequence of positions. On the other hand, the
broadcasting behaviour can be easily, explicitely, obtained by using
numpy.mgrid and similar functions.
> Best regards, and many apologies for the inconvenience,
No problem, any help and code contribution is more than welcome!
I guess that fixing this issue should not be too difficult, so you could make
another stab :-) If there is a consensus here, of course... (Nathan?)
cheers,
r.
More information about the SciPy-Dev
mailing list