# [SciPy-User] Cholesky problem (I need dtrtrs, not dpotrs)

Sturla Molden sturla@molden...
Thu Jul 8 23:33:28 CDT 2010

```Sturla Molden skrev:
> Is there any way of getting access to Lapack function dtrtrs or BLAS
> function dtrsm from SciPy? cho_solve does not really do what I want (it
> calls dpotrs). Which by the way is extremely annoying, since 99% of use
> cases for Cholesky (at least in statistics) require solving U'X = Y, not
> U'UX = Y as cho_solve do. Goloub & van Loan does not even bother to
> mention the dpotrs algorithm.
>

Just to elaborate on this:

Say we want to calculate the Mahalanobis distance from somw points X to
a distribution N(m,S). With cho_factor and cho_solve, that would be

cx = X - m
sqmahal = (cx*cho_solve(cho_factor(S),cx.T).T).sum(axis=1)

whereas a similar routine "tri_solve" using dtrtrs would be

cx = X - m
sqmahal = (tri_solve(cho_factor(S),cx.T).T**2).sum(axis=1)

This looks almost the same in Python, but the solution with tri_solve (dtrtrs) requires only half as many flops as cho_solve (dpotrs) does.

In many statistical applications requiring substantial amount of computation (EM algorithms, MCMC simulation, and the like), computing Mahalanobis distances can be the biggest bottleneck.

So that is one thing I really miss in

http://svn.scipy.org/svn/scipy/trunk/scipy/linalg/decomp_cholesky.py

P.S. In Fortran or C we would rather use a tight loop on dtrsv instead, computing sqmahal point by point, as it is more friendly to cache than dtrtrs on the whole block. Computing mahalanobis distances efficiently is a so common use case for Cholesky that I (almost) suggest this be added to SciPy as well.
<http://svn.scipy.org/svn/scipy/trunk/scipy/linalg/decomp_cholesky.py>
P.P.S. Those transpositions are actually required to make it run fast; in Matlab they would slow things down terribly. NumPy and Matlab is very different in this respect. A transpose does not create a new array in NumPy, it just switches the order flag between C and Fortran. C order is NumPy's native, but we must have Fortran order before calling BLAS or LAPACK. If we don't, f2py will make a copy with a transpose. So we avoid a transpose by taking a transpose. It might seem a bit paradoxical.

Sturla

```