vectorized linalg

Tim Hochberg tim.hochberg at
Mon Oct 9 09:45:37 CDT 2006

David Goldsmith wrote:
> Tim Hochberg wrote:
>> I periodically need to perform various linalg operations on a large 
>> number of small matrices. The normal numpy approach of stacking up the 
>> data into an extra dimension and doing the operations in parallel 
>> doesn't work because the linalg functions are only designed to work on 
>> one matrix at a time. At first glance, this restriction seems harmless 
>> enough since the underlying LAPACK functions are also designed to work 
>> one matrix at a time and it seem that there would be little to be gained 
>> by parallelizing the Python level code. However, this turns out to be 
>> the case.
>> I have some code, originally written for numarray, but I recently ported 
>> it over to numpy, that rewrites the linalg function determinant, inverse 
>> and solve_linear_equations[1] to operate of 2 or 3D arrays, treating 3D 
>> arrays as stacks of 2D arrays. For operating on large numbers of small 
>> arrays (for example 1000 2x2 arrays), this approach is over an order of 
>> magnitude faster than the obvious map(inverse, threeDArray) approach.
>> So, some questions:
>>     1. Is there any interest in this code for numpy.linalg? I'd be 
>> willing to clean it up and work on the other functions in linalg so that 
>> they were all consistent. Since these changes should all be backwards 
>> compatible (unless your counting on catching an error from one of the 
>> linalg functions), I don't see this as a problem to put in after 1.0, so 
>> there's really no rush on this. I'm only bringing it up now since I just 
>> ported it over to numpy and it's fresh in my mind.
> Say "no" to someone offering to make a Python module more feature rich?  
> Blasphemy! :-)
> But I am very curious as to the source/basis of the performance 
> improvement - did you figure this out?
Yes and no. Here's what the begining of linalg.det looks lik:

    a = asarray(a)
    t, result_t = _commonType(a)
    a = _fastCopyAndTranspose(t, a)
    n = a.shape[0]
    if isComplexType(t):
        lapack_routine = lapack_lite.zgetrf
        lapack_routine = lapack_lite.dgetrf
    # ...

There's quite a few function calls here, since the actual call to dgetrf 
probably takes neglible time for a 2x2 array, the cost of computation 
here is dominated by function call overhead, creating the return arrays 
and such not. Which is the biggest culprit, I'm not sure; by vectorizing 
it, much of that overhead is only incurred once rather than hundreds or 
thousands of times, so it all becomes negligible at that point. I didn't 
bother to track down the worst offender.


Take Surveys. Earn Cash. Influence the Future of IT
Join's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash

More information about the Numpy-discussion mailing list