[SciPy-dev] Possible fix for scipy.sparse.lil_matrix column-slicing problem

Robert Cimrman cimrman3@ntc.zcu...
Tue Dec 1 07:11:51 CST 2009


Hi Tim,

Tim Victor wrote:
> On Mon, Nov 30, 2009 at 5:32 AM, Robert Cimrman wrote:
>> Is there are reason why you removed the special case of x being a scalar, namely:
>>
>> -        elif issequence(i) and issequence(j):
>> -            if np.isscalar(x):
>> -                for ii, jj in zip(i, j):
>> -                    self._insertat(ii, jj, x)
>> -            else:
>> -                for ii, jj, xx in zip(i, j, x):
>> -                    self._insertat(ii, jj, xx)
>>
>> This removal broke a code of mine, which now takes forever, and behaves in a
>> different way. Try this:
>>
>> In [1]: import scipy.sparse as spp
>> In [2]: a = spp.lil_matrix((1000, 1000))
>> In [3]: a
>> Out[3]:
>> <1000x1000 sparse matrix of type '<type 'numpy.float64'>'
>>         with 0 stored elements in LInked List format>
>> In [4]: import numpy as np
>> In [5]: ir = ic = np.arange(1000)
>> In [6]: a[ir, ic] = 1
>>
>> The result is a matrix with all the entries set to 1 (= full!), not just the
>> diagonal, which was the previous (IMHO good) behaviour. In the real code I do
>> not set the diagonal, but some other elements given by two lists ir, ic, but
>> the code above shows the symptoms.
>>
>> I can fix easily my code by not using the LIL matrix:
>>
>> In [15]: a = spp.coo_matrix((np.ones((ir.shape[0],)), (ir, ic)))
>> In [16]: a
>> Out[16]:
>> <1000x1000 sparse matrix of type '<type 'numpy.float64'>'
>>         with 1000 stored elements in COOrdinate format>
>>
>> but I wonder, if the above change in behaviour was intended...
>>
>> cheers,
>> r.
> ---------------------
> 
> Robert,
> 
>> Is there are reason why you removed the special case of x being a scalar, namely:
> 
> Actually I don't think it was the special case of x being a scalar,
> but rather the special case of the row index and the column index both
> being sequences.

Yes, x being scalar is in a sense a subset of the case when i, j, x are three 
sequences of the same length.

> Let me know if this looks wrong to you, but using NumPy dense arrays
> and matrices as a reference for correct behavior, I get:
> 
> m = sp.zeros((10,10))
> 
> # set first column of m to ones:
> m[range(10),0] = 1
> 
> # set diagonal of m to -1:
> m[range(10),range(10)] = -1
> m
> 
> (prints:)
> array([[-1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
>        [ 1., -1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
>        [ 1.,  0., -1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
>        [ 1.,  0.,  0., -1.,  0.,  0.,  0.,  0.,  0.,  0.],
>        [ 1.,  0.,  0.,  0., -1.,  0.,  0.,  0.,  0.,  0.],
>        [ 1.,  0.,  0.,  0.,  0., -1.,  0.,  0.,  0.,  0.],
>        [ 1.,  0.,  0.,  0.,  0.,  0., -1.,  0.,  0.,  0.],
>        [ 1.,  0.,  0.,  0.,  0.,  0.,  0., -1.,  0.,  0.],
>        [ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0., -1.,  0.],
>        [ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0., -1.]])
> 
> The first case, assigning a scalar with a range for the row index and
> a scalar for the column index, lil_matrix still works the way it did.
> It iterates the range of rows and sets column 0 for each. The second
> case, iterating both of the ranges and using the row/column indexes in
> pairs, was inadvertently changed.
> 
> A third case suggests that it isn't only about assigning a scalar,
> since assigning from a sequence using pairwise-matched sequences of
> row and column indices should work similarly:
> 
> # assign random values to the reverse diagonal:
> m[range(10),range(9,-1,-1)] = sp.rand(10)
> m
> 
> (prints:)
> array([[-1.        ,  0.        ,  0.        ,  0.        ,  0.        ,
>          0.        ,  0.        ,  0.        ,  0.        ,  0.90216781],
>        [ 1.        , -1.        ,  0.        ,  0.        ,  0.        ,
>          0.        ,  0.        ,  0.        ,  0.54728632,  0.        ],
>        [ 1.        ,  0.        , -1.        ,  0.        ,  0.        ,
>          0.        ,  0.        ,  0.97993962,  0.        ,  0.        ],
>        [ 1.        ,  0.        ,  0.        , -1.        ,  0.        ,
>          0.        ,  0.66932184,  0.        ,  0.        ,  0.        ],
>        [ 1.        ,  0.        ,  0.        ,  0.        , -1.        ,
>          0.64246538,  0.        ,  0.        ,  0.        ,  0.        ],
>        [ 1.        ,  0.        ,  0.        ,  0.        ,  0.93092643,
>         -1.        ,  0.        ,  0.        ,  0.        ,  0.        ],
>        [ 1.        ,  0.        ,  0.        ,  0.25711642,  0.        ,
>          0.        , -1.        ,  0.        ,  0.        ,  0.        ],
>        [ 1.        ,  0.        ,  0.28826595,  0.        ,  0.        ,
>          0.        ,  0.        , -1.        ,  0.        ,  0.        ],
>        [ 1.        ,  0.01331807,  0.        ,  0.        ,  0.        ,
>          0.        ,  0.        ,  0.        , -1.        ,  0.        ],
>        [ 0.86963499,  0.        ,  0.        ,  0.        ,  0.        ,
>          0.        ,  0.        ,  0.        ,  0.        , -1.        ]])
> 
> Yes, I broke that. :-) I haven't used this behavior or seen it
> documented and it wasn't covered by a unit test, and I missed it. (In
> my defense, lil_matrix is broken even worse for that third case in
> both SciPy versions 0.7.0 and 0.7.1.)
> 
> Robert, can you (or anyone else out there) tell me if this covers it
> all, or whether there is some more general array-indexing behavior
> that should be implemented? I'll be happy to put in a case to handle
> it.


I think scipy.sparse indexing should follow the behavior of numpy dense arrays. 
  This is what current SVN scipy does (0.8.0.dev6122):

In [1]: import scipy.sparse as spp
In [2]: a = spp.lil_matrix((10,10))
In [3]: a[range(10),0] = 1

This is ok.

In [5]: a[range(10),range(10)] = -1
In [8]: print a.todense()
------> print(a.todense())
[[-1. -1. -1. -1. -1. -1. -1. -1. -1. -1.]
  [-1. -1. -1. -1. -1. -1. -1. -1. -1. -1.]
  [-1. -1. -1. -1. -1. -1. -1. -1. -1. -1.]
  [-1. -1. -1. -1. -1. -1. -1. -1. -1. -1.]
  [-1. -1. -1. -1. -1. -1. -1. -1. -1. -1.]
  [-1. -1. -1. -1. -1. -1. -1. -1. -1. -1.]
  [-1. -1. -1. -1. -1. -1. -1. -1. -1. -1.]
  [-1. -1. -1. -1. -1. -1. -1. -1. -1. -1.]
  [-1. -1. -1. -1. -1. -1. -1. -1. -1. -1.]
  [-1. -1. -1. -1. -1. -1. -1. -1. -1. -1.]]

This is IMHO not ok (what other sparse matrix users think?)

In [9]: import scipy as sp
In [10]: a[range(10),range(9,-1,-1)] = sp.rand(10)
In [12]: print a.todense()
-------> print(a.todense())
[[ 0.71485802  0.95410746  0.07350778  0.77786453  0.28376647  0.35679679
    0.34837324  0.48012982  0.87469439  0.98490982]
  [ 0.71485802  0.95410746  0.07350778  0.77786453  0.28376647  0.35679679
    0.34837324  0.48012982  0.87469439  0.98490982]
  [ 0.71485802  0.95410746  0.07350778  0.77786453  0.28376647  0.35679679
    0.34837324  0.48012982  0.87469439  0.98490982]
  [ 0.71485802  0.95410746  0.07350778  0.77786453  0.28376647  0.35679679
    0.34837324  0.48012982  0.87469439  0.98490982]
  [ 0.71485802  0.95410746  0.07350778  0.77786453  0.28376647  0.35679679
    0.34837324  0.48012982  0.87469439  0.98490982]
  [ 0.71485802  0.95410746  0.07350778  0.77786453  0.28376647  0.35679679
    0.34837324  0.48012982  0.87469439  0.98490982]
  [ 0.71485802  0.95410746  0.07350778  0.77786453  0.28376647  0.35679679
    0.34837324  0.48012982  0.87469439  0.98490982]
  [ 0.71485802  0.95410746  0.07350778  0.77786453  0.28376647  0.35679679
    0.34837324  0.48012982  0.87469439  0.98490982]
  [ 0.71485802  0.95410746  0.07350778  0.77786453  0.28376647  0.35679679
    0.34837324  0.48012982  0.87469439  0.98490982]
  [ 0.71485802  0.95410746  0.07350778  0.77786453  0.28376647  0.35679679
    0.34837324  0.48012982  0.87469439  0.98490982]]

same as above...

> I'm reminded of the Zen of Python "Explicit is better than implicit." guideline.

:)

Consider also this: the current behavior (broadcasting the index arrays to the 
whole rectangle) is not compatible with NumPy, and does not allow setting 
elements e.g. in a random sequence of positions. On the other hand, the 
broadcasting behaviour can be easily, explicitely, obtained by using 
numpy.mgrid and similar functions.

> Best regards, and many apologies for the inconvenience,

No problem, any help and code contribution is more than welcome!

I guess that fixing this issue should not be too difficult, so you could make 
another stab :-) If there is a consensus here, of course... (Nathan?)

cheers,
r.



More information about the SciPy-Dev mailing list