[Numpy-discussion] Should dot return 1x1 matrix as scalar?
Paulo J. S. Silva
pjssilva at ime.usp.br
Fri Jan 13 11:18:01 CST 2006
> I would be interested to see how a raw-array solution compares.
> There
> is going to be some overhead of using a subclass, because of the
> attribute lookups that occur on all array creations and because the
> subclass is written in Python that could be on the order of 10%.
>
> Quantifying the subclass slow-down would be useful... Also, which
> BLAS
> are you using?
I am using matlab 6.5.0 release 13 in a Athlon tbird 1.2Ghz
Blas with numpy is ATLAS optimized for 3dnow.
I m getting very interesting results. First the time for a QR
factorization of a 1000x1000 random matrix:
Matlab:
>> t = cputime; R = houseqr(A); cputime - t
ans =
67.1000
Numpy:
In [9]:i = time.clock(); R = houseqr.houseqr(A); time.clock() - i
Out[9]:79.009999999999991
If I code numpy naively making an extra matrix slice this drops to:
In [14]:i = time.clock(); R = houseqr.houseqr(A); time.clock() - i
Out[14]:114.34999999999999
(See the code below).
I have then decided to campare the blas and here things get funny:
Matrix multiplication:
Matlab:
>> t = cputime; A*A; cputime - t
ans =
2.0600
Numpy:
In [32]:i = time.clock(); C=A*A; time.clock() - i
Out[32]:1.3600000000000136
It looks like numpy (ATLAS) is much better....
However in the QR code the most important part is a matrix times vector
multiplication and an outer product. If I benchmark this operations I
get:
Matlab:
>> t = cputime; bench(A,b); cputime - t
ans =
5.7500
Numpy:
In [27]:i = time.clock(); bench(A,b); time.clock() - i
Out[27]:10.610000000000014
Why is numpy so slow??????
Paulo
code
--- houseqr.py ---
from numpy import *
def norm(x):
return sqrt(x.T*x).item()
def sinal(x):
if x < 0.0: return -1.0
else: return 1.0
def houseqr(A):
m, n = A.shape
R = A.copy()
bigE1 = matrix(zeros((m, 1), float))
bigE1[0] = 1.0
for j in range(n):
x = R[j:,j]
v = sinal(x[0])*norm(x)*bigE1[:m-j] + x
v = v / norm(v)
# Slower version.
#R[j:,j:] -= 2.0*v*(v.T*R[j:,j:])
# Faster version: avoids the extra slicing.
upRight = R[j:,j:]
upRight -= 2.0*v*(v.T*upRight)
return R
--- houseqr.m ---
function [A] = houseqr(A)
[m, n] = size(A);
bigE1 = zeros(m,1);
bigE1(1) = 1.0;
for j = 1:n,
x = A(j:m,j);
v = sinal(x(1))*norm(x)*bigE1(1:m-j+1) + x;
v = v / (norm(v));
upRight = A(j:m,j:n);
A(j:m,j:n) = upRight - 2*v*(v'*upRight);
end
--- bench.py ---
from numpy import *
def bench(A, b):
for i in range(100):
c = b.T*A
d = b*c
--- bench.m ---
function bench(A, b)
for i=1:100,
c = b'*A;
d = b*c;
end
More information about the Numpy-discussion
mailing list