[Numpy-discussion] Numpy matrix multiplication slow even though ATLAS linked

Jan-Willem van de Meent vandemeent@damtp.cam.ac...
Fri Oct 31 12:40:21 CDT 2008


On Friday 31 October 2008 13:45:56 Pauli Virtanen wrote:
> Thu, 30 Oct 2008 22:19:01 +0000, Jan-Willem van de Meent wrote:
> > On Thursday 30 October 2008 18:41:51 Charles R Harris wrote:
> >> On Thu, Oct 30, 2008 at 5:19 AM, Jan-Willem van de Meent <
> >>
> >> vandemeent@damtp.cam.ac.uk> wrote:
> >> > Dear all,
> >> >
> >> > This is my first post to this list. I am having perfomance issues
> >> > with with numpy/atlas. Doing dot(a,a) for a 2000x2000 matrix takes
> >> > about 1m40s, even though numpy is appears to link to my atlas
> >> > libraries:
>
> Can you try to benchmark your ATLAS library using a simple C or Fortran
> program to check if the problem is in Numpy, or in Atlas itself.
>
> For comparison,
>
> 	gfortran -o test test.f90 -lblas
>
> 	time ./test   # ATLAS
> 	-> 0.55 s
>
> 	LD_PRELOAD=/usr/lib/libblas.so.3.0 time ./test  # reference BLAS
> 	-> 5.6 s
>
>
> test.f90
> --------
> program main
>     integer, parameter :: n = 1000
>     double precision, dimension(n,n) :: a, b, c
>     integer :: i, j
>
>     do i = 1, n
>         do j = 1,n
>             a(i,j) = i+j
>             b(i,j) = i-j
>         end do
>     end do
>
>     call dgemm('N', 'N', n, n, n, 1d0, a, n, b, n, 0d0, c, n)
> end program main
>
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion

I must admit have no experience calling Atlas routines from either C or Fortan 
and am a bit clumsy with compilers. However, I got your test case to compile 
by doing: 

   fortran -o test_atlas test_atlas.f90 -lptf77blas -latlas

which gives

    time ./test_atlas
    -> 0.85 s

I don't understand what the LD_PRELOAD directive is supposed to do, but timing 
it gives

  time LD_PRELOAD=/usr/lib/libblas.so.3.0.3 ./test_atlas
  -> 0.86 s

For reference, here are the results of xatlbench and xdmmtst_big (generated at 
compile time by Atlas). As far as I can tell from comparison with on-line 
posted results, these should be pretty normal.

./xatlbench

Clock rate=1667Mhz
               single precision        double precision
            *********************    ********************
               real      complex       real      complex
Benchmark   %   Clock   %   Clock   %   Clock   %   Clock
=========   =========   =========   =========   =========
  kSelMM       264.6      264.6       86.1       84.6
  kGenMM        86.7       89.3       84.7       84.6
  kMM_NT        78.8       77.6       75.5       75.9
  kMM_TN        87.2       84.2       77.7       82.1
  BIG_MM       261.4      261.9       85.2       85.7
   kMV_N        27.2       91.4       50.7       78.3
   kMV_T        75.2       76.9       53.9       63.1
    kGER        49.0       84.7       23.6       46.5

./xdmmtst_big

TEST  TA  TB    M    N    K  alpha   beta    Time  Mflop  SpUp  PASS
====  ==  ==  ===  ===  ===  =====  =====  ======  =====  ====  ====

   1   N   N  100  100  100    1.0    1.0    0.00 600.1  1.00   ---
   1   N   N  100  100  100    1.0    1.0    0.00 600.1  1.00   YES
   2   N   N  200  200  200    1.0    1.0    0.01 1600.0  1.00   ---
   2   N   N  200  200  200    1.0    1.0    0.01 1600.2  1.00   YES
   3   N   N  300  300  300    1.0    1.0    0.04 1350.1  1.00   ---
   3   N   N  300  300  300    1.0    1.0    0.04 1350.1  1.00   YES
   4   N   N  400  400  400    1.0    1.0    0.09 1371.5  1.00   ---
   4   N   N  400  400  400    1.0    1.0    0.09 1422.3  1.04   YES
   5   N   N  500  500  500    1.0    1.0    0.18 1389.0  1.00   ---
   5   N   N  500  500  500    1.0    1.0    0.18 1389.0  1.00   YES
   6   N   N  600  600  600    1.0    1.0    0.31 1408.8  1.00   ---
   6   N   N  600  600  600    1.0    1.0    0.31 1408.8  1.00   YES
   7   N   N  700  700  700    1.0    1.0    0.49 1409.7  1.00   ---
   7   N   N  700  700  700    1.0    1.0    0.49 1409.7  1.00   YES
   8   N   N  800  800  800    1.0    1.0    0.73 1409.3  1.00   ---
   8   N   N  800  800  800    1.0    1.0    0.73 1409.3  1.00   YES
   9   N   N  900  900  900    1.0    1.0    1.03 1411.1  1.00   ---
   9   N   N  900  900  900    1.0    1.0    1.03 1411.1  1.00   YES
  10   N   N 1000 1000 1000    1.0    1.0    1.41 1418.5  1.00   ---
  10   N   N 1000 1000 1000    1.0    1.0    1.42 1408.5  0.99   YES


More information about the Numpy-discussion mailing list