[SciPy-user] sparse matrices - list assignment to rows and columns

josef.pktd@gmai... josef.pktd@gmai...
Thu Apr 9 08:36:30 CDT 2009


On Thu, Apr 9, 2009 at 5:17 AM, Gergely Imreh <imrehg@gmail.com> wrote:
> 2009/4/9  <josef.pktd@gmail.com>:
>> On Thu, Apr 9, 2009 at 4:51 AM,  <josef.pktd@gmail.com> wrote:
>>> On Wed, Apr 8, 2009 at 11:51 PM, Gergely Imreh <imrehg@gmail.com> wrote:
>>>> Hi,
>>>>
>>>>  I was trying figure out the scipy sparse matrix handling, but run
>>>> into some difficulties assigning a list of values to rows and columns.
>>>>  The scipy tutorial has the following example [1]:
>>>>
>>>> from scipy import sparse
>>>> Asp = sparse.lil_matrix((50000,50000))
>>>> Asp.setdiag(ones(50000))
>>>> Asp[20,100:250] = 10*random.rand(150)
>>>> Asp[200:250,30] = 10*random.rand(50)
>>>>
>>>>  That looks straightforward enough, make a large, diagonal sparse
>>>> matrix, and set some additional elements to non-zero. What I get,
>>>> however, is different:
>>>> Asp[20,100:250] = 10*random.rand(150)  sets the matrix elements at row
>>>> 20, column 100-249 to random values.
>>>> Asp[200:250,30] = 10*random.rand(50) sets the matrix element at row
>>>> 200, column 30 to a 50 element row vector with random values....
>>>> (elements at row 201-249, column 30 are still 0)
>>>>  If I reshape the results of random.rand(50) to be in a column
>>>> instead of row, the assignment will results in the elements of row
>>>> 200-249, column 30 to be set to a single element array values (So, for
>>>> exaple Asp[200,30] will be an array, which will have a single random
>>>> value at  [0,0])
>>>>
>>>>  I'm using Python 2.6 (that comes with my distro, or 2.4 for which
>>>> I'd have to recompile scipy) and scipy 0.7.0. Is this kind of
>>>> behaviour due to the changes (and incompatibilites) of 2.6 (since I
>>>> know scipy is writtend to be compatible up to 2.5) or something else?
>>>> The other sparse matrix types would handle this differently?
>>>>  A workaround is to do single element assignments but I'd think
>>>> that's probably slower in general.
>>>>
>>>>  Cheers!
>>>>      Greg
>>>> [1] http://www.scipy.org/SciPy_Tutorial
>>>
>>>
>>> There is an assignment error:
>>> Asp[200:250,30]  seems to assign all 50 elements to to the position Asp[200,30]
>>>
>>>>>> Asp[200,30]
>>> <1x50 sparse matrix of type '<type 'numpy.float64'>'
>>>        with 50 stored elements in LInked List format>
>>>>>> Aspc = Asp.tocrc()
>>> Traceback (most recent call last):
>>>  File "<pyshell#4>", line 1, in <module>
>>>    Aspc = Asp.tocrc()
>>>  File "c:\josef\_progs_scipy\scipy\sparse\base.py", line 429, in __getattr__
>>>    raise AttributeError, attr + " not found"
>>> AttributeError: tocrc not found
>>
>> sorry, I copied the wrong traceback, it should be:
>>
>>>>> Aspc = Asp.tocsr()
>> Traceback (most recent call last):
>>  File "<pyshell#7>", line 1, in <module>
>>    Aspc = Asp.tocsr()
>>  File "c:\josef\_progs_scipy\scipy\sparse\lil.py", line 427, in tocsr
>>    data = np.asarray(data, dtype=self.dtype)
>>  File "C:\Programs\Python25\Lib\site-packages\numpy\core\numeric.py",
>> line 230, in asarray
>>    return array(a, dtype, copy=False, order=order)
>> ValueError: setting an array element with a sequence.
>>
>>>
>>> this is with
>>>>>> scipy.version.version
>>> '0.8.0.dev5551'
>>>
>>> there is a related assignment error that got fixed in trunk,
>>> http://thread.gmane.org/gmane.comp.python.scientific.user/19996
>>> I don't know if it also handles this case, a bug report might be
>>> useful to make sure this case is handled correctly
>>>
>>> I think, for this example dok format would be better to build the
>>> matrix, since column slices need to access many lists
>>>
>>> Asp = sparse.dok_matrix((50000,50000))
>>> Aspr = Asp.tocsr()
>>>
>>> works without problems
>>>
>>> I checked the history of the scipy tutorial that you linked to, the
>>> main editing has been done in 2006, and maybe it isn't up to date.
>>>
>>> The current docs are being written and are available at
>>> http://docs.scipy.org/doc/
>>>
>>> Josef
>>>
>
>
> Yes, I think is the same, I got a ValueError as well, having upgraded
> to the latest (r5655) version.
>
> Traceback (most recent call last):
>  File "sp2.py", line 6, in <module>
>    Asp[200:250,30] = 10*random.rand(50)
>  File "/usr/lib/python2.6/site-packages/scipy/sparse/lil.py", line
> 329, in __setitem__
>    self._insertat3(row, data, j, xx)
>  File "/usr/lib/python2.6/site-packages/scipy/sparse/lil.py", line
> 285, in _insertat3
>    self._insertat2(row, data, j, x)
>  File "/usr/lib/python2.6/site-packages/scipy/sparse/lil.py", line
> 246, in _insertat2
>    raise ValueError('setting an array element with a sequence')
> ValueError: setting an array element with a sequence
>
> Checked out the new documentation you referenced[1] and there is only
> same-row assignment (e.g. A[0, :100] = rand(100)  ) but no same-column
> assignment...
>
> So still, my question is that is there something inherently different
> between array -> row and array -> column assigment in this case?

I'm not completely sure about the internal structure, but essentially
the non-zero values are stored in row-wise lists, to access a column
slice it needs to access all the rowlists and insert to each, and this
is much slower.

see the explanation in the documentation describing the different formats.

Josef


More information about the SciPy-user mailing list