#533: numpy.trace is slow for small dimensions
Comment(by pv):

 A problem is also that `trace` calls `diagonal` at all -- trace can be
 taken without taking a copy of the array, which `diagonal` does. IIRC,
 trace can be implemented by suitable striding. Ditto for `diagonal`,
 although I suppose the semantics shouldn't be changed here.

