[Numpy-discussion] expensive tensordot
Dag Sverre Seljebotn
Tue Jun 15 14:42:33 CDT 2010
Paul Northug wrote:
> I have a computation bounded by one step and I have always wondered
> how to make it fast enough to be useable. I suspect that I have to use
> an approximation, but I was hoping someone would spot a major
> inefficiency in my implementation.
> The calculation is a kind of outer product of two sets of time series:
> for n,p in np.ndindex(self.P, self.P):
> self.A[:,n,:,p] += np.tensordot(a[:,:,n:n+self.T],
> ([0,2], [0,2]))
> The sizes for all indices are on the order of 100 with the exception
> that a.size == 10. One inefficiency is that A is symmetric in (n,p)
> but this is only a factor of 2.
The memory access pattern appears to be quite inefficient. It is better
to vary the last indices fastest, i.e. you should avoid indexing or
slicing in the last dimension and leave those to tensordot if at all
possible. Use arr.T.copy() or arr.copy('F') prior/after the loop.
More information about the NumPy-Discussion