[SciPy-dev] Ideas for scipy.sparse?

Nathan Bell wnbell@gmail....
Sat Apr 12 20:29:26 CDT 2008


On Sat, Apr 12, 2008 at 12:21 PM, Brian Granger <ellisonbg.net@gmail.com> wrote:
>  Sure.  Indexing (getting items) is O(1), but if I do it N=some big
>  number times, it is O(N*1) = O(N).  If I have to do all the indexing
>  through python (__getitem__) I have the overhead of calling a python
>  function all N times.  This is the same reason we all encourage people
>  to try to avoid writing for i in range(N) style loops to do things
>  with numpy arrays.

This is better accomplished by supporting indexing of the form
A[[1,2,3],[1,2,3]] in sparsetools.  The other forms of indexing are
already handled efficiently.

>  Yep, it is really just the small stuff that is not being done in
>  sparsetools, but that might need to be done many times that could be
>  optimized.

>  Basically, I am thinking of a public C-API for doing things with the
>  sparse arrays.  A true C extension type for the sparse array class.
>  That could go into sparsetools, but I think it will be easier to
>  write/maintain in cython.

Why would you want to do that?  It would make it more difficult to
change the backend in the future for no apparent benefit.

IMO you're obligated to show a real-world shortcoming or performance
problem with the existing code before proposing to replace Python with
something more convoluted.

I agree that lil_matrix and dok_matrix are slow pure-Python
implementations.  OTOH I don't believe that csr_matrix and friends
suffer the same inefficiencies.  Adding vectorized indexing for
CSR/CSC to sparsetools is straightforward and addresses the only
problem you've pointed out.

/doesn't drink the Cython Kool-Aid

-- 
Nathan Bell wnbell@gmail.com
http://graphics.cs.uiuc.edu/~wnbell/


More information about the Scipy-dev mailing list