[Numpy-discussion] Special matrices with structure?

Dag Sverre Seljebotn d.s.seljebotn@astro.uio...
Thu Feb 23 11:47:58 CST 2012


On 02/23/2012 05:50 AM, Jaakko Luttinen wrote:
> Hi!
>
> I was wondering whether it would be easy/possible/reasonable to have
> classes for arrays that have special structure in order to use less
> memory and speed up some computations?
>
> For instance:
> - symmetric matrix could be stored in almost half the memory required by
> a non-symmetric matrix
> - diagonal matrix only needs to store the diagonal vector
> - Toeplitz matrix only needs to store one or two vectors
> - sparse matrix only needs to store non-zero elements (some
> implementations in scipy.sparse)
> - and so on
>
> If such classes were implemented, it would be nice if they worked with
> numpy functions (dot, diag, ...) and operations (+, *, +=, ...) easily.
>
> I believe this has been discussed before but google didn't help a lot..

I'm currently working on a library for this. The catch is that that I'm 
doing it as a work project, not a hobby project -- so only the features 
I strictly need for my PhD thesis really gets priority. That means that 
it will only really be developed for use on clusters/MPI, not so much 
for single-node LAPACK.

I'd love to pair up with someone who could make sure the library is more 
generally useful, which is my real goal (if I ever get spare time again...).

The general idea of my approach is to have lazily evaluated expressions:

A = # ... diagonal matrix
B = # ... dense matrix

L = (give(A) + give(B)).cholesky() # only "symbolic"!
# give means: overwrite if you want to

explain(L) # prints what it will do if it computes L
L = compute(L) # does the computation

What the code above would do is:

  - First, determine that the fastest way of doing + is to take the 
elements in A and += them inplace to the diagonal in B
  - Then, do the Cholesky in

Note that if you change the types of. The goal is to facilitate writing 
general code which doesn't know the types of the matrices, yet still 
string together the optimal chain of calls. This requires waiting with 
evaluation until an explicit compute call (which essentially does a 
"compilation").

Adding matrix types and operations is done through pattern matching. 
This one can provide code like this to provide optimized code for wierd 
special cases:

@computation(RowMajorDense + ColMajorDense, RowMajorDense)
def add(a, b):
     # provide an optimized case for row-major + col-major, resulting
     # in row-major

@cost(add)
def add_cost(a, b):
     # provide estimate for cost of the above routine

The compiler looks at all the provided @computation and should 
determines the cheapest path.

My code is at https://github.com/dagss/oomatrix, but I certainly haven't 
done anything yet to make the codebase useful to anyone but me, so you 
probably shouldn't look at it, but rather ask me here.

Dag


More information about the NumPy-Discussion mailing list