# [Numpy-discussion] Should dot return 1x1 matrix as scalar?

Paulo J. S. Silva pjssilva at ime.usp.br
Sat Jan 14 05:52:06 CST 2006

```Em Sex, 2006-01-13 às 21:40 -0700, Travis Oliphant escreveu:
> Paulo J. S. Silva wrote:
>
> >Numpy:
> >
> >In [27]:i = time.clock(); bench(A,b); time.clock() - i
> >Out[27]:10.610000000000014
> >
> >
> >Why is numpy so slow??????
> >
> >
> >
>
> I think the problem here is that using the properties here to take
> advantage of the nice matrix math stuff is slower than just computing
> the dot product in the fastest possible way with raw arrays.  I've been
>
> While a matrix is a nice thing, I think it will always be slower....  It
> might be possible to speed it up and I'm open to suggestions...
>
> To see what I mean, try this....
>

Travis,

I think I got the gotcha!

The problem is the in the dot function. I have only skimmed on the
_dotblas code, but I assume it try to find the right blas function based
on the arguments ranks. If both arguments are matrices then both are
rank two and the matrix-multiply blas function is called. This is "very"
suboptimal if some of the matrices objects actually represent a vector
or a scalar.

The same problem would appear with pure arrays if one insists on using a
column vector:

In [3]:t1 = timeit.Timer('c = dot(A,b)','from numpy import rand, dot; A
= rand(1000,1000);b = rand(1000)')

In [4]:t2 = timeit.Timer('c = A*b','from numpy import rand, dot, mat; A
= mat(rand(1000,1000));b = mat(rand(1000,1))')

In [5]:t3 = timeit.Timer('c = dot(A,b)','from numpy import rand, dot,
transpose; A = rand(1000,1000);b = rand(1000,1)')

In [6]:t1.timeit(100)
Out[6]:0.69448995590209961

In [7]:t2.timeit(100)
Out[7]:1.1080958843231201

You see? The third test with pure arrays is as slow as the matrix based
test.

The problem is even more dramatic if you are trying to compute an inner
product:

In [13]:t1 = timeit.Timer('c = dot(b,b)','from numpy import rand, dot; b
= rand(1000)')

In [14]:t2 = timeit.Timer('c = b.T*b','from numpy import rand, mat; b =
mat(rand(1000,1))')

In [15]:t3 = timeit.Timer('c = dot(transpose(b),b)','from numpy import
rand, dot, transpose; b = rand(1000,1)')

In [16]:t1.timeit(10000)
Out[16]:0.053219079971313477

In [17]:t2.timeit(10000)
Out[17]:0.65550899505615234

In [18]:t3.timeit(10000)
Out[18]:0.62446498870849609

Note that this is a very serious drawback for matrices objects as the
most usual operation, matrix-vector multiplication, in numerical linear
algebra algorithms is always suboptimal.

The solution may be a "smarter" analysis in _dotblas.c that not only
looks at the rank but that also "collapses" dimensions with only one
element to decide which blas function to call. Note that this approach
may be used to solve the problem with 1x1 matrices if properly coded.
Some special care has to be taken to identify outer products to.

I may try to look at this myself if you want. But certainly only in
three weeks. I am preparing for an exam to confirm (make permanent) my
position at the university that will take place in the last days of
January.

Best,

Paulo

```