# [Numpy-discussion] untenable matrix behavior in SVN

Anne Archibald peridot.faceted@gmail....
Wed Apr 30 22:16:22 CDT 2008

```2008/4/30 Charles R Harris <charlesr.harris@gmail.com>:

> Some operations on stacks of small matrices are easy to get, for instance,
> +,-,*,/, and matrix multiply. The last is the interesting one. If A and B
> are stacks of matrices with the same number of dimensions with the matrices
> stored in the last two indices, then
>
> sum(A[...,:,:,newaxis]*B[...,newaxis,:,:], axis=-2)
>
> is the matrix-wise multiplication of the two stacks. If B is replaced by a
> stack of 1D vectors, x, it is even simpler:
>
> sum(A[...,:,:]*x[...,newaxis,:], axis=-1)
>
> This doesn't go through BLAS, but for large stacks of small matrices it
> might be even faster than BLAS because BLAS is kinda slow for small
> matrices.

Yes and no. For the first operation, you have to create a temporary
that is larger than either of the two input arrays. These invisible
(potentially) gigantic temporaries are the sort of thing that puzzle
users when as their problem size grows they suddenly find they hit a
massive slowdown because it starts swapping to disk, and then a
failure because the temporary can't be allocated. This is one reason
we have dot() and tensordot() even though they can be expressed like
this. (The other is of course that it lets us use optimized BLAS.)

This rather misses the point of Timothy Hochberg's suggestion (as I
understood it), though: yes, you can write the basic operations in
numpy, in a more or less efficient fashion. But it would be very
valuable for arrays to have some kind of metadata that let them keep
track of which dimensions represented simple array storage and which
represented components of a linear algebra object. Such metadata could
make it possible to use, say, dot() as if it were a binary ufunc
taking two matrices. That is, you could feed it two arrays of
matrices, which it would broadcast to the same shape if necessary, and
then it would compute the elementwise matrix product.

The question I have is, what is the right mathematical model for
describing these
arrays-some-of-whose-dimensions-represent-linear-algebra-objects?

One idea is for each dimension to be flagged as one of "replication",
"vector", or "covector". A column vector might then be a rank-1 vector
array, a row vector might be a rank-1 covector array, a linear
operator might be a rank-2 object with one covector and one vector
dimension, a bilinear form might be a rank-2 object with two covector
dimensions. Dimensions designed for holding repetitions would be
flagged as such, so that (for example) an image might be an array of
shape (N,M,3) of types ("replication","replication","vector"); then to
apply a color-conversion matrix one would simply use dot() (or "*" I
suppose). without too much concern for which index was which. The
problem is, while this formalism sounds good to me, with a background
in differential geometry, if you only ever work in spaces with a
canonical metric, the distinction between vector and covector may seem

Implementing such a thing need not be too difficult: start with a new
subclass of ndarray which keeps a tuple of dimension types. Come up
with an adequate set of operations on them, and implement them in
terms of numpy's functions, taking advantage of the extra information
about each axis. A few operations that spring to mind:

* Addition: it doesn't make sense to add vectors and covectors; raise
an exception. Otherwise addition is always elementwise anyway. (How
hard should addition work to match up corresponding dimensions?)
* Multiplication: elementwise across "repetition" axes, it combines
vector axes with corresponding covector axes to get some kind of
generalized matrix product. (How is "corresponding" defined?)
* Division: mostly doesn't make sense unless you have an array of
scalars (I suppose it could compute matrix inverses?)
* Exponentiation: very limited (though I suppose matrix powers could
be implemented if the shapes are right)
* Change of basis: this one is tricky because not all dimensions need
come from the same vector space
* Broadcasting: the rules may become a bit byzantine...