[Numpy-discussion] Matrix dot product over an axis(for a 3d array/list of matrices)
> > If you need/want more speed than the solution Chuck proposed, you should
> check out Cython and Tokyo. Cython lets you write loops that execute at C
> speed, whereas Tokyo provides a Cython level wrapper for BLAS (no need to go
> through Python code to call NumPy). Tokyo was designed for exactly your use
> case: lots of matrix multiplies with relatively small matrices, where you
> start noticing the Python overhead.
For speed I'd go straight to c and avoid BLAS since the matrices are so
small. There might also be a cache advantage to copying the non-contiguous
columns of the rhs to the stack.
Chuck
