[SciPy-user] sparse matrices - list assignment to rows and columns
josef.pktd@gmai...
josef.pktd@gmai...
Thu Apr 9 08:53:25 CDT 2009
On Thu, Apr 9, 2009 at 9:36 AM, <josef.pktd@gmail.com> wrote:
> On Thu, Apr 9, 2009 at 5:17 AM, Gergely Imreh <imrehg@gmail.com> wrote:
>> 2009/4/9 <josef.pktd@gmail.com>:
>>> On Thu, Apr 9, 2009 at 4:51 AM, <josef.pktd@gmail.com> wrote:
>>>> On Wed, Apr 8, 2009 at 11:51 PM, Gergely Imreh <imrehg@gmail.com> wrote:
>>>>> Hi,
>>>>>
>>>>> I was trying figure out the scipy sparse matrix handling, but run
>>>>> into some difficulties assigning a list of values to rows and columns.
>>>>> The scipy tutorial has the following example [1]:
>>>>>
>>>>> from scipy import sparse
>>>>> Asp = sparse.lil_matrix((50000,50000))
>>>>> Asp.setdiag(ones(50000))
>>>>> Asp[20,100:250] = 10*random.rand(150)
>>>>> Asp[200:250,30] = 10*random.rand(50)
>>>>>
>>>>> That looks straightforward enough, make a large, diagonal sparse
>>>>> matrix, and set some additional elements to non-zero. What I get,
>>>>> however, is different:
>>>>> Asp[20,100:250] = 10*random.rand(150) sets the matrix elements at row
>>>>> 20, column 100-249 to random values.
>>>>> Asp[200:250,30] = 10*random.rand(50) sets the matrix element at row
>>>>> 200, column 30 to a 50 element row vector with random values....
>>>>> (elements at row 201-249, column 30 are still 0)
>>>>> If I reshape the results of random.rand(50) to be in a column
>>>>> instead of row, the assignment will results in the elements of row
>>>>> 200-249, column 30 to be set to a single element array values (So, for
>>>>> exaple Asp[200,30] will be an array, which will have a single random
>>>>> value at [0,0])
>>>>>
>>>>> I'm using Python 2.6 (that comes with my distro, or 2.4 for which
>>>>> I'd have to recompile scipy) and scipy 0.7.0. Is this kind of
>>>>> behaviour due to the changes (and incompatibilites) of 2.6 (since I
>>>>> know scipy is writtend to be compatible up to 2.5) or something else?
>>>>> The other sparse matrix types would handle this differently?
>>>>> A workaround is to do single element assignments but I'd think
>>>>> that's probably slower in general.
>>>>>
>>>>> Cheers!
>>>>> Greg
>>>>> [1] http://www.scipy.org/SciPy_Tutorial
>>>>
>>>>
>>>> There is an assignment error:
>>>> Asp[200:250,30] seems to assign all 50 elements to to the position Asp[200,30]
>>>>
>>>>>>> Asp[200,30]
>>>> <1x50 sparse matrix of type '<type 'numpy.float64'>'
>>>> with 50 stored elements in LInked List format>
>>>>>>> Aspc = Asp.tocrc()
>>>> Traceback (most recent call last):
>>>> File "<pyshell#4>", line 1, in <module>
>>>> Aspc = Asp.tocrc()
>>>> File "c:\josef\_progs_scipy\scipy\sparse\base.py", line 429, in __getattr__
>>>> raise AttributeError, attr + " not found"
>>>> AttributeError: tocrc not found
>>>
>>> sorry, I copied the wrong traceback, it should be:
>>>
>>>>>> Aspc = Asp.tocsr()
>>> Traceback (most recent call last):
>>> File "<pyshell#7>", line 1, in <module>
>>> Aspc = Asp.tocsr()
>>> File "c:\josef\_progs_scipy\scipy\sparse\lil.py", line 427, in tocsr
>>> data = np.asarray(data, dtype=self.dtype)
>>> File "C:\Programs\Python25\Lib\site-packages\numpy\core\numeric.py",
>>> line 230, in asarray
>>> return array(a, dtype, copy=False, order=order)
>>> ValueError: setting an array element with a sequence.
>>>
>>>>
>>>> this is with
>>>>>>> scipy.version.version
>>>> '0.8.0.dev5551'
>>>>
>>>> there is a related assignment error that got fixed in trunk,
>>>> http://thread.gmane.org/gmane.comp.python.scientific.user/19996
>>>> I don't know if it also handles this case, a bug report might be
>>>> useful to make sure this case is handled correctly
>>>>
>>>> I think, for this example dok format would be better to build the
>>>> matrix, since column slices need to access many lists
>>>>
>>>> Asp = sparse.dok_matrix((50000,50000))
>>>> Aspr = Asp.tocsr()
>>>>
>>>> works without problems
>>>>
>>>> I checked the history of the scipy tutorial that you linked to, the
>>>> main editing has been done in 2006, and maybe it isn't up to date.
>>>>
>>>> The current docs are being written and are available at
>>>> http://docs.scipy.org/doc/
>>>>
>>>> Josef
>>>>
>>
>>
>> Yes, I think is the same, I got a ValueError as well, having upgraded
>> to the latest (r5655) version.
>>
>> Traceback (most recent call last):
>> File "sp2.py", line 6, in <module>
>> Asp[200:250,30] = 10*random.rand(50)
>> File "/usr/lib/python2.6/site-packages/scipy/sparse/lil.py", line
>> 329, in __setitem__
>> self._insertat3(row, data, j, xx)
>> File "/usr/lib/python2.6/site-packages/scipy/sparse/lil.py", line
>> 285, in _insertat3
>> self._insertat2(row, data, j, x)
>> File "/usr/lib/python2.6/site-packages/scipy/sparse/lil.py", line
>> 246, in _insertat2
>> raise ValueError('setting an array element with a sequence')
>> ValueError: setting an array element with a sequence
>>
>> Checked out the new documentation you referenced[1] and there is only
>> same-row assignment (e.g. A[0, :100] = rand(100) ) but no same-column
>> assignment...
>>
>> So still, my question is that is there something inherently different
>> between array -> row and array -> column assigment in this case?
>
> I'm not completely sure about the internal structure, but essentially
> the non-zero values are stored in row-wise lists, to access a column
> slice it needs to access all the rowlists and insert to each, and this
> is much slower.
>
> see the explanation in the documentation describing the different formats.
>
> Josef
>
I opened a ticket
http://projects.scipy.org/scipy/ticket/917
Josef
More information about the SciPy-user
mailing list