[Numpy-discussion] Special matrices with structure?
Dag Sverre Seljebotn
d.s.seljebotn@astro.uio...
Thu Feb 23 11:47:58 CST 2012
On 02/23/2012 05:50 AM, Jaakko Luttinen wrote:
> Hi!
>
> I was wondering whether it would be easy/possible/reasonable to have
> classes for arrays that have special structure in order to use less
> memory and speed up some computations?
>
> For instance:
> - symmetric matrix could be stored in almost half the memory required by
> a non-symmetric matrix
> - diagonal matrix only needs to store the diagonal vector
> - Toeplitz matrix only needs to store one or two vectors
> - sparse matrix only needs to store non-zero elements (some
> implementations in scipy.sparse)
> - and so on
>
> If such classes were implemented, it would be nice if they worked with
> numpy functions (dot, diag, ...) and operations (+, *, +=, ...) easily.
>
> I believe this has been discussed before but google didn't help a lot..
I'm currently working on a library for this. The catch is that that I'm
doing it as a work project, not a hobby project -- so only the features
I strictly need for my PhD thesis really gets priority. That means that
it will only really be developed for use on clusters/MPI, not so much
for single-node LAPACK.
I'd love to pair up with someone who could make sure the library is more
generally useful, which is my real goal (if I ever get spare time again...).
The general idea of my approach is to have lazily evaluated expressions:
A = # ... diagonal matrix
B = # ... dense matrix
L = (give(A) + give(B)).cholesky() # only "symbolic"!
# give means: overwrite if you want to
explain(L) # prints what it will do if it computes L
L = compute(L) # does the computation
What the code above would do is:
- First, determine that the fastest way of doing + is to take the
elements in A and += them inplace to the diagonal in B
- Then, do the Cholesky in
Note that if you change the types of. The goal is to facilitate writing
general code which doesn't know the types of the matrices, yet still
string together the optimal chain of calls. This requires waiting with
evaluation until an explicit compute call (which essentially does a
"compilation").
Adding matrix types and operations is done through pattern matching.
This one can provide code like this to provide optimized code for wierd
special cases:
@computation(RowMajorDense + ColMajorDense, RowMajorDense)
def add(a, b):
# provide an optimized case for row-major + col-major, resulting
# in row-major
@cost(add)
def add_cost(a, b):
# provide estimate for cost of the above routine
The compiler looks at all the provided @computation and should
determines the cheapest path.
My code is at https://github.com/dagss/oomatrix, but I certainly haven't
done anything yet to make the codebase useful to anyone but me, so you
probably shouldn't look at it, but rather ask me here.
Dag
More information about the NumPy-Discussion
mailing list