[SciPy-user] repmat of matrix returns array

Bill Baxter wbaxter at gmail.com
Thu Apr 27 20:25:35 CDT 2006

The decision of matrix vs array really comes down to whether you intend to
mostly do linear algreba with the entities in quesiton or not.  In Matlab or
Octave  A * B always means matrix multiply (er, at least for ndims=2),  and
A .* B always means element-wise multiply.  If you're doing stuff with
images you rarely need *, and if you're doing mostly linear algebra you
don't need as much .* .

In numpy the meaning of * changes depending on whether the arguments are of
type 'numpy.matrix' or not.  That's pretty much the most significant
difference between array and matrix.  That plus the fact that reduction
operations on matrices like sum() will generally return matrices (with
ndims==2), while the same operations on arrays will return arrays with ndims
reduced by one.

The whole changing meaning of * seemed reasonable to me at first -- from an
object oriented programming perspective it makes sense.  But you know, I
find that I get bitten by it a lot in several ways:

1) I end up accidentally multiplying element-wise because the argumets end
up being arrays instead of matrices like I had planned.   (No runtime errors
either way if the arguments are both NxN)

2) Because of Python's dynamic typing, it's often quite difficult to tell
when reading code like " a * b "  if a and b are arrays or matrices (or
something else entirely).  But you can bet when I wrote the code I knew what
I meant.

3) Sometimes you do get an array back after performing an operation on a
matrix (This one's getting fixed gradually in Numpy code, but I don't think
it will ever go away because it's just as easy for developers to do it to
themselves in their own code.  I.e. write a function that takes both arrays
and matrices but always returns an array.  Since it's easier to get that
wrong than it is to get it right, I predict such issues will persist.)

Because of these points, I think I have to conclude that overloading * for
matrices was not such a great idea.  Or at least, there should be an
unabiguous way to spell 'matrix multiply' that _always_ means matrix
multiply.  Matlab seems to have gotten this one right.  I think if there
were such a function in Numpy, there would be very little reason to use the
matrix class.  We can't make it a unique infix operator like Matlab does, so
let's call just it numpy.linalg.mult, and people who want to use it can
import it as something easier to type. I propose the function should behave
as follows:

def mult(a, b):
    Multiplies a times b using the rules of tensor algebra, and returns c,
the product.

    (In the following description, a_ij means a[i, j].  Also the tensor
summation convention of summing over repeated dummy
     indices is assumed.  For instance, a_ij * b_jk means dot( a[i,:], b[:,
j] )

    If a and b are of type matrix, then the result is the same as 'a * b'.
In other words c_ik = a_ij b_jk.
    The returned c will also have type matrix.

    The remaining all deal with cases when one or both of the arguments are
of type array.

    If a.shape==(N,M) and b.shape==(M,P), then the result is just like the
matrix-matrix multiply: c_ik = a_ij b_jk.
    The resulting shape is (N,P).   If b.shape[0] is not M, then an error is

    If a.shape==(N,M), and b.shape==(M) or (M,1) this is treated as a
matrix-vector multiply.
    c_i = a_ij * b_j.  The return value c, will have c.shape==b.shape.  If
b.shape[0] is not M, an error is raised.

    If a.shape==(N) or (1,N), and b.shape==(N,M) this is treated as a
vector-matrix multiply.
    c_j = a_i * b_ij.  The return value c, will have c.shape==a.shape.  If
b.shape[1] is not N, an error is raised.

    TODO:  scalar cases...
    TODO: higher dimensional cases...

The general idea is to allow both matrices and arrays to be treated like
linear (or tensor) algebra thingies.  It could be restricted to just matrix
linear alg, but you might as well go ahead and support higher rank entities
while you're at it.  I think there's a simple succinct statement which could
cover all the above rules that you'll probably find in a tensor algebra text
book.  Basically from memory it's this (in a hodgepodge of notations):
      Let A be rank-N with shape (s1, ..., sN)
      Let B be rank-M with shape (t1, ..., tM)
      For the tensor product A * B to exist, we must have sN==t1
      Let i1,...,iN be indices for A, and j1,...,jN be indices for B.
            C[i1,..., iN-1, j2, ..., jM]  =  dot(  A[i1, ..., iN-1, :], B[:
, j2, ..., jM]  )

Where the ...'s are mathematical ...'s --- meaning I left the middle part
out -- not python syntax.
I'd have to think about it some more, but I think this fundamental
definition from tensor algebra automatically covers all the cases enumerated
in the above docstring, including reurning shape (N,1) for (N,M)*(M,1)
versus just (N) for (N,M)*(M).  Oh, there's one more twist needed to cover
transposes.  So actually it should be allowable to do (M,N)*(M,P).  It just
means A.transpose() * B.

Hmm, the more I think about it, the more I think the idea of a separate
matrix class in numpy is flawed.  It seems to exist essentially for the sole
purpose of overloading *, but that idea itself seems to be questionable as
argued above.    If you don't want to use the overloaded *, what benefit
remains?  Maybe it's this: with array you can easily end up with an (N)
array, and you want to know if it is (N,1) or (1,N).   The thing is, tensor
gets along algebra does quite fine without that distinction.  If you
multiply an (N) times a (M,N), there's only one thing that makes sense.
Hmm.. ok but (N) times (N,N) could mean either, ya.  Well in that case I'd
say the mult() above should prefer the version that keeps the arguments in
the order presented.  So in the presence of ambiguity:
     mult(rand(N),rand(N,N))  --> row-vec * matrix
     mult(rand(N,N),rand(N))  --> matrix * column vec

That puts the burden of remembering whether you meant to have a row vector
or a column vector on the user.  Or not, you can always explicity tack on
the extra axis to remind yourself if you want it there.

There is also potential ambiguity if you allow tensor alg-like automatic
     mult(rand(N,M),rand(N,P))  --> ok, means (N,M)^t * (N,P)
     mult(rand(N,M),rand(P,M))  --> ok, means (N,M) * (P,M)^t
     mult(rand(N,M),rand(M,N))  --> ambiguous, but return (N,M) * (M,N)  [
could also mean (N,M)^t * (M,N)^t ]
     mult(rand(N,M),rand(N,M))  --> ambiguious, not clear which is better:
(N,M)^t * (N,M)  or  (N,M) * (N,M)^t
... these ambiguities seems to suggest it would be better not to allow
automatic transposes...  or maybe just allow them in  the case of axes of
size 1, i.e. in the (1,N) or (N,1) case.

So this would work:
     mult(rand(N,1),rand(N,N))  --> matrix * column vec
     mult(rand(N,N),rand(N,1))  --> row-vec * matrix

So what do you think.  My gut tells me that tensor math on arrays fits well
with the numpy.array philosophy.  E.g. a vector is a rank-1 entity, not a
rank-2 entity with one of its dimensions being 1.

So any thoughts?

PS. Sorry, Keith, for going off on a tangent like this...

On 4/28/06, Keith Goodman <kwgoodman at gmail.com> wrote:
> Do most operations on matrices return matrices? In porting Octave
> code, I noticed that the repmat of a matrix is an array.
> I decided to use matrices simply because that's what they are called
> in Octave. What do you recommend for new users who have a background
> in Octave, matrix or array? Do people generally pick one and not use
> the other?
> _______________________________________________
> SciPy-user mailing list
> SciPy-user at scipy.net
> http://www.scipy.net/mailman/listinfo/scipy-user

William V. Baxter III
OLM Digital
Kono Dens Building Rm 302
1-8-8 Wakabayashi Setagaya-ku
Tokyo, Japan  154-0023
+81 (3) 3422-3380
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.scipy.net/pipermail/scipy-user/attachments/20060428/2e713c6d/attachment-0001.htm

More information about the SciPy-user mailing list