[Numpy-discussion] Fwd: GPU Numpy
Sturla Molden
sturla@molden...
Wed Sep 9 23:04:24 CDT 2009
George Dahl skrev:
> I know that for my work, I can get around an order of a 50-fold
speedup over
> numpy using a python wrapper for a simple GPU matrix class. So I
might be
> dealing with a lot of matrix products where I multiply a fixed 512 by
784 matrix
> by a 784 by 256 matrix that changes between each matrix product,
although to
> really see the largest gains I use a 4096 by 2048 matrix times a
bunch of 2048
> by 256 matrices.
Matrix multiplication is at the core of 3D graphics, and the raison
d'etre for GPUs. That is specifically what they are designed to do.
Matrix multiplication scale O(n**3) with floating point operations and
O(n**2) with memory access. That is GPUs gives fast 3D graphics (matrix
multiplications) by speeding up floating point operations.
GPUs makes sence for certain level-3 BLAS calls, but that really belongs
in BLAS, not in NumPy's core. One could e.g. consider linking with a
BLAS wrapper that directs these special cases to the GPU and the rest to
ATLAS / MKL / netlib BLAS.
Sturla Molden
More information about the NumPy-Discussion
mailing list