[SciPy-dev] feedback on scipy.sparse

Robert Cimrman cimrman3@ntc.zcu...
Thu Dec 13 03:21:58 CST 2007

Hi Nathan,

thanks for pushing scipy.sparse forwards!

Nathan Bell wrote:
> ===== Constructors =====
>   Here are the current constructors for the various sparse classes:
>   csr_matrix and csc_matrix
>     def __init__(self, arg1, dims=None, dtype=None, copy=False):
>   dok_matrix and lil_matrix
>     def __init__(self, A=None, shape=None, dtype=None, copy=False):
>   coo_matrix
>     def __init__(self, arg1, dims=None, dtype=None):
>   Empty matrices can now be constructed with xxx_matrix( (M,N) ) for
> all formats.
>  1) Should we prefer 'dims' over 'shape' or vice versa?  IMO 'shape'
> is arguably more natural since all the types have a .shape attribute

Yes, please.

>  2) It would be nice if xxx_matrix( A ) always worked when A is a
> sparse or dense matrix.  Does anyone object to this?  The
> functionality is already present (though the various .toxxx() methods)


>  3) When the user defines the dim (or shape) argument but the data
> violates these bounds, what should happen?  IMO this merits an
> exception, as opposed to expanding the dimensions to accommodate the
> data.

IMHO scipy.sparse should not assume anything that a user not asked 
explicitely -> I am for an exception.

> ===== sparse.py and sparse functions =====
> sparse.py currently weighs in at nearly 3000 lines and will continue
> growing.  I propose that we move the functions (e.g. spidentity(),
> spdiags(), spkron(), etc. ) to a separate file.  Any comments or
> proposals for the name of this file?  Would it be prudent to move the
> classes into separate files also?

sputils? Splitting into class files sounds good.

> Also, these functions always return a specific sparse format.  For
> example spidentity() always returns a csc_matrix, spkron() always
> returns a coo_matrix, etc.  Currently, a user who wanted the identity
> matrix in CSR format would have to do a CSC->CSR conversion on the
> result of spidentity().  This is somewhat wasteful since the
> spidentity() could easily have generated the CSR format instead.  It
> would be better to allow the user to specify the desired return type
> in the function call.  For example,
>    spidentity(n, dtype='d',format='csr')
> instead of
>    spidentity(n, dtype='d').tocsr()
> Sometimes a given function has a very natural return type.  For
> instance, when we have a dia_matrix() implementation (I'm working on
> one) then spdiags() would naturally use this format.  If the user
> specified another type,  spdiags( ..., format='csr') then spdiags()
> would, at worst, create the matrix in DIA format first and then
> convert to CSR (with dia_matrix.tocsr() ).  I like this approach
> because it allows the implementation to be clever when cleverness is
> possible, but also doesn't place an undue burden on the programmer
> when implementing a new method.  Furthermore, it shields the user from
> internal implementation changes that might change the default return
> format.

Good idea!

Concerning the Stefan's idea of static methods for spidentity etc., we 
could use only one method for all of them, e.g.

class spmatrix:

     def special( name, format = ... ):
	if name = 'identity':
             return spidentity(n,format=format)

to prevent cluttering od the class you mentioned.


