[SciPy-dev] Possible fix for scipy.sparse.lil_matrix column-slicing problem

Tim Victor timvictor@gmail....
Tue Dec 1 10:00:12 CST 2009


On Tue, Dec 1, 2009 at 8:11 AM, Robert Cimrman wrote:
> Hi Tim,
>
> Tim Victor wrote:
>> On Mon, Nov 30, 2009 at 5:32 AM, Robert Cimrman wrote:
>>> Is there are reason why you removed the special case of x being a scalar, namely:
>>>
>>> -        elif issequence(i) and issequence(j):
>>> -            if np.isscalar(x):
>>> -                for ii, jj in zip(i, j):
>>> -                    self._insertat(ii, jj, x)
>>> -            else:
>>> -                for ii, jj, xx in zip(i, j, x):
>>> -                    self._insertat(ii, jj, xx)
>>>
>>> This removal broke a code of mine, which now takes forever, and behaves in a
>>> different way. Try this:
>>>
>>> In [1]: import scipy.sparse as spp
>>> In [2]: a = spp.lil_matrix((1000, 1000))
>>> In [3]: a
>>> Out[3]:
>>> <1000x1000 sparse matrix of type '<type 'numpy.float64'>'
>>>         with 0 stored elements in LInked List format>
>>> In [4]: import numpy as np
>>> In [5]: ir = ic = np.arange(1000)
>>> In [6]: a[ir, ic] = 1
>>>
>>> The result is a matrix with all the entries set to 1 (= full!), not just the
>>> diagonal, which was the previous (IMHO good) behaviour. In the real code I do
>>> not set the diagonal, but some other elements given by two lists ir, ic, but
>>> the code above shows the symptoms.
>>>
>>> I can fix easily my code by not using the LIL matrix:
>>>
>>> In [15]: a = spp.coo_matrix((np.ones((ir.shape[0],)), (ir, ic)))
>>> In [16]: a
>>> Out[16]:
>>> <1000x1000 sparse matrix of type '<type 'numpy.float64'>'
>>>         with 1000 stored elements in COOrdinate format>
>>>
>>> but I wonder, if the above change in behaviour was intended...
>>>
>>> cheers,
>>> r.
>> ---------------------
>>
>> Robert,
>>
>>> Is there are reason why you removed the special case of x being a scalar, namely:
>>
>> Actually I don't think it was the special case of x being a scalar,
>> but rather the special case of the row index and the column index both
>> being sequences.
>
> Yes, x being scalar is in a sense a subset of the case when i, j, x are three
> sequences of the same length.
>
>> Let me know if this looks wrong to you, but using NumPy dense arrays
>> and matrices as a reference for correct behavior, I get:
>>
>> m = sp.zeros((10,10))
>>
>> # set first column of m to ones:
>> m[range(10),0] = 1
>>
>> # set diagonal of m to -1:
>> m[range(10),range(10)] = -1
>> m
>>
>> (prints:)
>> array([[-1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
>>        [ 1., -1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
>>        [ 1.,  0., -1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
>>        [ 1.,  0.,  0., -1.,  0.,  0.,  0.,  0.,  0.,  0.],
>>        [ 1.,  0.,  0.,  0., -1.,  0.,  0.,  0.,  0.,  0.],
>>        [ 1.,  0.,  0.,  0.,  0., -1.,  0.,  0.,  0.,  0.],
>>        [ 1.,  0.,  0.,  0.,  0.,  0., -1.,  0.,  0.,  0.],
>>        [ 1.,  0.,  0.,  0.,  0.,  0.,  0., -1.,  0.,  0.],
>>        [ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0., -1.,  0.],
>>        [ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0., -1.]])
>>
>> The first case, assigning a scalar with a range for the row index and
>> a scalar for the column index, lil_matrix still works the way it did.
>> It iterates the range of rows and sets column 0 for each. The second
>> case, iterating both of the ranges and using the row/column indexes in
>> pairs, was inadvertently changed.
>>
>> A third case suggests that it isn't only about assigning a scalar,
>> since assigning from a sequence using pairwise-matched sequences of
>> row and column indices should work similarly:
>>
>> # assign random values to the reverse diagonal:
>> m[range(10),range(9,-1,-1)] = sp.rand(10)
>> m
>>
>> (prints:)
>> array([[-1.        ,  0.        ,  0.        ,  0.        ,  0.        ,
>>          0.        ,  0.        ,  0.        ,  0.        ,  0.90216781],
>>        [ 1.        , -1.        ,  0.        ,  0.        ,  0.        ,
>>          0.        ,  0.        ,  0.        ,  0.54728632,  0.        ],
>>        [ 1.        ,  0.        , -1.        ,  0.        ,  0.        ,
>>          0.        ,  0.        ,  0.97993962,  0.        ,  0.        ],
>>        [ 1.        ,  0.        ,  0.        , -1.        ,  0.        ,
>>          0.        ,  0.66932184,  0.        ,  0.        ,  0.        ],
>>        [ 1.        ,  0.        ,  0.        ,  0.        , -1.        ,
>>          0.64246538,  0.        ,  0.        ,  0.        ,  0.        ],
>>        [ 1.        ,  0.        ,  0.        ,  0.        ,  0.93092643,
>>         -1.        ,  0.        ,  0.        ,  0.        ,  0.        ],
>>        [ 1.        ,  0.        ,  0.        ,  0.25711642,  0.        ,
>>          0.        , -1.        ,  0.        ,  0.        ,  0.        ],
>>        [ 1.        ,  0.        ,  0.28826595,  0.        ,  0.        ,
>>          0.        ,  0.        , -1.        ,  0.        ,  0.        ],
>>        [ 1.        ,  0.01331807,  0.        ,  0.        ,  0.        ,
>>          0.        ,  0.        ,  0.        , -1.        ,  0.        ],
>>        [ 0.86963499,  0.        ,  0.        ,  0.        ,  0.        ,
>>          0.        ,  0.        ,  0.        ,  0.        , -1.        ]])
>>
>> Yes, I broke that. :-) I haven't used this behavior or seen it
>> documented and it wasn't covered by a unit test, and I missed it. (In
>> my defense, lil_matrix is broken even worse for that third case in
>> both SciPy versions 0.7.0 and 0.7.1.)
>>
>> Robert, can you (or anyone else out there) tell me if this covers it
>> all, or whether there is some more general array-indexing behavior
>> that should be implemented? I'll be happy to put in a case to handle
>> it.
>
>
> I think scipy.sparse indexing should follow the behavior of numpy dense arrays.
>  This is what current SVN scipy does (0.8.0.dev6122):
>
> In [1]: import scipy.sparse as spp
> In [2]: a = spp.lil_matrix((10,10))
> In [3]: a[range(10),0] = 1
>
> This is ok.
>
> In [5]: a[range(10),range(10)] = -1
> In [8]: print a.todense()
> ------> print(a.todense())
> [[-1. -1. -1. -1. -1. -1. -1. -1. -1. -1.]
>  [-1. -1. -1. -1. -1. -1. -1. -1. -1. -1.]
>  [-1. -1. -1. -1. -1. -1. -1. -1. -1. -1.]
>  [-1. -1. -1. -1. -1. -1. -1. -1. -1. -1.]
>  [-1. -1. -1. -1. -1. -1. -1. -1. -1. -1.]
>  [-1. -1. -1. -1. -1. -1. -1. -1. -1. -1.]
>  [-1. -1. -1. -1. -1. -1. -1. -1. -1. -1.]
>  [-1. -1. -1. -1. -1. -1. -1. -1. -1. -1.]
>  [-1. -1. -1. -1. -1. -1. -1. -1. -1. -1.]
>  [-1. -1. -1. -1. -1. -1. -1. -1. -1. -1.]]
>
> This is IMHO not ok (what other sparse matrix users think?)
>
> In [9]: import scipy as sp
> In [10]: a[range(10),range(9,-1,-1)] = sp.rand(10)
> In [12]: print a.todense()
> -------> print(a.todense())
> [[ 0.71485802  0.95410746  0.07350778  0.77786453  0.28376647  0.35679679
>    0.34837324  0.48012982  0.87469439  0.98490982]
>  [ 0.71485802  0.95410746  0.07350778  0.77786453  0.28376647  0.35679679
>    0.34837324  0.48012982  0.87469439  0.98490982]
>  [ 0.71485802  0.95410746  0.07350778  0.77786453  0.28376647  0.35679679
>    0.34837324  0.48012982  0.87469439  0.98490982]
>  [ 0.71485802  0.95410746  0.07350778  0.77786453  0.28376647  0.35679679
>    0.34837324  0.48012982  0.87469439  0.98490982]
>  [ 0.71485802  0.95410746  0.07350778  0.77786453  0.28376647  0.35679679
>    0.34837324  0.48012982  0.87469439  0.98490982]
>  [ 0.71485802  0.95410746  0.07350778  0.77786453  0.28376647  0.35679679
>    0.34837324  0.48012982  0.87469439  0.98490982]
>  [ 0.71485802  0.95410746  0.07350778  0.77786453  0.28376647  0.35679679
>    0.34837324  0.48012982  0.87469439  0.98490982]
>  [ 0.71485802  0.95410746  0.07350778  0.77786453  0.28376647  0.35679679
>    0.34837324  0.48012982  0.87469439  0.98490982]
>  [ 0.71485802  0.95410746  0.07350778  0.77786453  0.28376647  0.35679679
>    0.34837324  0.48012982  0.87469439  0.98490982]
>  [ 0.71485802  0.95410746  0.07350778  0.77786453  0.28376647  0.35679679
>    0.34837324  0.48012982  0.87469439  0.98490982]]
>
> same as above...
>
>> I'm reminded of the Zen of Python "Explicit is better than implicit." guideline.
>
> :)
>
> Consider also this: the current behavior (broadcasting the index arrays to the
> whole rectangle) is not compatible with NumPy, and does not allow setting
> elements e.g. in a random sequence of positions. On the other hand, the
> broadcasting behaviour can be easily, explicitely, obtained by using
> numpy.mgrid and similar functions.
>
>> Best regards, and many apologies for the inconvenience,
>
> No problem, any help and code contribution is more than welcome!
>
> I guess that fixing this issue should not be too difficult, so you could make
> another stab :-) If there is a consensus here, of course... (Nathan?)
>
> cheers,
> r.

Yes, I agree with you 100%, Robert. The behavior of NumPy for dense
arrays should be the guide, and I tried to follow it but didn't know
to check that case.

I don't defend how my version handles your case where the i and j
indexes are both sequences. The behavior that you expect is correct
and I plan to fix it to make your code work. I would however like to
make sure that I understand it well and get it all correct this
time--including correctly handling the case where the right-hand side
is also a sequence.

Best regards,

Tim Victor


More information about the SciPy-Dev mailing list