# [Numpy-discussion] Should dot return 1x1 matrix as scalar?

Paulo J. S. Silva pjssilva at ime.usp.br
Fri Jan 13 11:18:01 CST 2006

```> I would be interested to see how a raw-array solution compares.
> There
> is going to be some overhead of using a subclass, because of the
> attribute lookups that occur on all array creations and because the
> subclass is written in Python that could be on the order of 10%.
>
> Quantifying the subclass slow-down would be useful...  Also, which
> BLAS
> are you using?

I am using matlab 6.5.0 release 13 in a Athlon tbird 1.2Ghz
Blas with numpy is ATLAS optimized for 3dnow.

I m getting very interesting results. First the time for a QR
factorization of a 1000x1000 random matrix:

Matlab:

>> t = cputime; R = houseqr(A); cputime - t

ans =

67.1000

Numpy:

In [9]:i = time.clock(); R = houseqr.houseqr(A); time.clock() - i
Out[9]:79.009999999999991

If I code numpy naively making an extra matrix slice this drops to:

In [14]:i = time.clock(); R = houseqr.houseqr(A); time.clock() - i
Out[14]:114.34999999999999

(See the code below).

I have then decided to campare the blas and here things get funny:

Matrix multiplication:

Matlab:

>> t = cputime; A*A; cputime - t

ans =

2.0600

Numpy:

In [32]:i = time.clock(); C=A*A; time.clock() - i
Out[32]:1.3600000000000136

It looks like numpy (ATLAS) is much better....

However in the QR code the most important part is a matrix times vector
multiplication and an outer product. If I benchmark this operations I
get:

Matlab:

>> t = cputime; bench(A,b); cputime - t

ans =

5.7500

Numpy:

In [27]:i = time.clock(); bench(A,b); time.clock() - i
Out[27]:10.610000000000014

Why is numpy so slow??????

Paulo

code

--- houseqr.py ---

from numpy import *

def norm(x):
return sqrt(x.T*x).item()

def sinal(x):
if x < 0.0: return -1.0
else: return 1.0

def houseqr(A):
m, n = A.shape
R = A.copy()

bigE1 = matrix(zeros((m, 1), float))
bigE1[0] = 1.0
for j in range(n):
x = R[j:,j]
v = sinal(x[0])*norm(x)*bigE1[:m-j] + x
v = v / norm(v)
# Slower version.
#R[j:,j:] -= 2.0*v*(v.T*R[j:,j:])
# Faster version: avoids the extra slicing.
upRight = R[j:,j:]
upRight -= 2.0*v*(v.T*upRight)

return R

--- houseqr.m ---

function [A] = houseqr(A)

[m, n] = size(A);

bigE1 = zeros(m,1);
bigE1(1) = 1.0;
for j = 1:n,
x = A(j:m,j);
v = sinal(x(1))*norm(x)*bigE1(1:m-j+1) + x;
v = v / (norm(v));
upRight = A(j:m,j:n);
A(j:m,j:n) = upRight - 2*v*(v'*upRight);
end

--- bench.py ---

from numpy import *

def bench(A, b):
for i in range(100):
c = b.T*A
d = b*c

--- bench.m ---

function bench(A, b)
for i=1:100,
c = b'*A;
d = b*c;
end

```