[Numpy-discussion] Question: LAPACK-LITE consequences for Numpy speed (win32 platfor m)

Raymond Beausoleil beausol at hpl.hp.com
Tue Jan 2 10:57:35 CST 2001

I use Windows NT/2000 (at gunpoint), and I've gotten much better 
performance than NumPy's Lite code by using the Intel MKL. I keep meaning 
to post these results to an external HP Labs web page, but we're currently 
engaged in an internal snit about what those pages are allowed to look like.

Essentially, with NumPy 17 and Python 1.5, I've been able to:
(1) Replace NumPy BLAS with Intel's MKL BLAS, after a number of routine 
renaming edits in the NumPy code. (Apparently "cblas" is an irresistible 
prefix.) This turns out to be pretty easy.
(2) Replace portions of LAPACK Lite with the corresponding routines from 
Intel's MKL. I've been hacking away at this on a piecemeal basis, because 
the MKL uses Fortran calling conventions and array-ordering. In most of the 
MKL routines, the usual LAPACK flags for transposition are available to 
(3) Add new LAPACK functionality using LAPACK routines from the MKL that 
are not included in LAPACK Lite.

I've been meaning to pick this project up again, since I want to use the 
Intel FFT routines in some new code that I'm writing. The chief benefit of 
the MKL libraries is the ability to easily use multiple processors and 
threads simply by setting a couple of environment variables. Until recently 
(when MATLAB finally started using LAPACK), I was able to gain 
significantly better performance than MATLAB using the Intel BLAS/LAPACK, 
even without multiple processors. Now the Intel libraries are only about 
10-25% faster, if my watch can be believed. I have _no idea_ how the Intel 
MKL performs relative to the native Linux BLAS/LAPACK distributions, but I 
know that (for example) the MKL is about 50% faster than the linear algebra 
routines shipped by NAG.

Although I haven't figured out the standard NumPy distribution process yet 
(I still use that pesky Visual Studio IDE), I'll elevate the development of 
a general-purpose integration of the Intel MKL into NumPy to an Official 
New Year's Resolution. I made my last ONYR about ten years ago, and was 
able to give up channel-surfing using the TV remote control. So, I have 
demonstrated an ability to deliver in this department; the realization that 
we've just started a new millennium only adds to the pressure. Presumably, 
when I get this finished (I'll target the end of the month), I'll be able 
to find a place to post it.

Ray Beausoleil
Hewlett-Packard Laboratories
mailto:beausol at hpl.hp.com
425-883-6648    Office
425-957-4951    Telnet
425-941-2566    Mobile

At 05:56 PM 1/2/2001 +1100, Shuetrim, Geoff wrote:
>Apologies for asking an overly vague question like this but:
>on an Intel/win32 platform (I only have a windows version of
>Gauss), I am comparing Numpy matrix inversion with that
>of Gauss (very much the same type of software as Matlab
>which at least some of you used to work with). As the size
>of the matrix to be inverted increases, the speed of Numpy
>appears to asymptote (on my machine) to about half that of
>Gauss. For small matrices, it is much worse than that
>because of the various overheads that are dealt with by Numpy.
>Would this speed differential be largely eliminated if I was not
>using LAPACK-LITE? If so I will try to figure my way through
>hooking into Intel's MKL - anyone got hints on doing this - I saw
>mention of it in the mailing list archives. Would I be better off,
>speed-wise, eschewing win32 altogether and using native LAPACK
>and BLAS libraries on my Linux box?
>This is relevant to me in the context of a multivariate Kalman filtering
>module that I am working up to replace one I have been using on
>the Gauss platform for years. The Numpy version of my filter has a
>very similar logic structure to that of Gauss but is drastically slower.
>I have only been working with Numpy for a month or so which may
>mean that my code is relatively innefficient. I have been assuming
>that Numpy - as an interpreted language is mainly slowed by
>looping structures.
>Thanks in advance,
>Geoff Shuetrim
>A simple version of the filter is given below:
>(Note that I have modified Matrix.py in my installation to include a
>transpose method for the Matrix class, T()).
># ***************************************************************
># kalman.py module by Geoff Shuetrim
># Please note - this code is thoroughly untested at this stage
># You may copy and use this module as you see fit with no
># guarantee implied provided you keep this notice in all copies.
># ***************************************************************
># Minimization routines
>Routines to implement Kalman filtering and smoothing
>for multivariate linear state space representations
>of time-series models.
>Notation follows that contained in Harvey, A.C. (1989)
>"Forecasting Structural Time Series Models and the Kalman Filter".
>Filter  ---      Filter - condition on data to date
># Import the necessary modules to use NumPy
>import math
>from Numeric import *
>from LinearAlgebra import *
>from RandomArray import *
>import MLab
>from Matrix import *
># Initialise module constants
>Num = Numeric
>max = MLab.max
>min = MLab.min
>abs = Num.absolute
># filtration constants
>_obs = 100
>_k = 1
>_n = 1
>_s = 1
># Filtration global arrays
>_y = Matrix(cumsum(standard_normal((_obs,1,_n))))
>_v = Matrix(zeros((_obs,1,_n),Float64))
>_Z = Matrix(ones((_obs,_k,_n),Float64)) + 1.0
>_d = Matrix(zeros((_obs,1,_n),Float64))
>_H = Matrix(zeros((_obs,_n,_n),Float64)) + 1.0
>_T = Matrix(zeros((_obs,_k,_k),Float64)) + 1.0
>_c = Matrix(zeros((_obs,1,_k),Float64))
>_R = Matrix(zeros((_obs,_k,_s),Float64)) + 1.0
>_Q = Matrix(zeros((_obs,_s,_s),Float64)) + 1.0
>_a  = Matrix(zeros((_obs,1,_k),Float64))
>_a0 = Matrix(zeros((_k,1),Float64))
>_ap = _a
>_as = _a
>_P  = Matrix(zeros((_obs,_k,_k),Float64))
>_P0 = Matrix(zeros((_k,_k),Float64))
>_Pp = _P
>_Ps = _P
>_LL = Matrix(zeros((_obs,1,1),Float64))
>def Filter():   # Kalman filtering routine
>     _ap[0] = _T[0] * _a0 + _c[0]
>     _Pp[0] = _T[0] * _P0 * _T[0].T() + _R[0] * _Q[0] * _R[0].T()
>     for t in range(1,_obs-1):
>         _ap[t] = _T[t] * _a[t-1] + _c[t]
>         _Pp[t] = _T[t] * _P0 * _T[t].T() + _R[t] * _Q[t] * _R[t].T()
>         Ft = _Z[t] * _Pp[t] * _Z[t].T() + _H[t]
>         Ft_inverse = inverse(Ft)
>         _v[t] = _y[t] - _Z[t] * _ap[t] - _d[t]
>         _a[t] = _ap[t] + _Pp[t] * _Z[t].T() * Ft_inverse * _v[t].T()
>         _P[t] = _Pp[t] - _Pp[t].T() * _Z[t].T() * Ft_inverse * _Z[t] *
>         _LL[t] = -0.5 * (log(2*pi) + log(determinant(Ft)) + _v[t] *
>Ft_inverse * _v[t].T())
>" This email is intended only for the use of the individual or entity
>named above and may contain information that is confidential and
>privileged. If you are not the intended recipient, you are hereby
>notified that any dissemination, distribution or copying of this
>Email is strictly prohibited. When addressed to our clients, any
>opinions or advice contained in this Email are subject to the
>terms and conditions expressed in the governing KPMG client
>engagement letter. If you have received this Email in error, please
>notify us immediately by return email or telephone +61 2 93357000
>and destroy the original message. Thank You. "
>Numpy-discussion mailing list
>Numpy-discussion at lists.sourceforge.net

Ray Beausoleil
Hewlett-Packard Laboratories
mailto:beausol at hpl.hp.com
425-883-6648    Office
425-957-4951    Telnet
425-941-2566    Mobile

More information about the Numpy-discussion mailing list