tim.hochberg at ieee.org
Mon Oct 9 09:45:37 CDT 2006
David Goldsmith wrote:
> Tim Hochberg wrote:
>> I periodically need to perform various linalg operations on a large
>> number of small matrices. The normal numpy approach of stacking up the
>> data into an extra dimension and doing the operations in parallel
>> doesn't work because the linalg functions are only designed to work on
>> one matrix at a time. At first glance, this restriction seems harmless
>> enough since the underlying LAPACK functions are also designed to work
>> one matrix at a time and it seem that there would be little to be gained
>> by parallelizing the Python level code. However, this turns out to be
>> the case.
>> I have some code, originally written for numarray, but I recently ported
>> it over to numpy, that rewrites the linalg function determinant, inverse
>> and solve_linear_equations to operate of 2 or 3D arrays, treating 3D
>> arrays as stacks of 2D arrays. For operating on large numbers of small
>> arrays (for example 1000 2x2 arrays), this approach is over an order of
>> magnitude faster than the obvious map(inverse, threeDArray) approach.
>> So, some questions:
>> 1. Is there any interest in this code for numpy.linalg? I'd be
>> willing to clean it up and work on the other functions in linalg so that
>> they were all consistent. Since these changes should all be backwards
>> compatible (unless your counting on catching an error from one of the
>> linalg functions), I don't see this as a problem to put in after 1.0, so
>> there's really no rush on this. I'm only bringing it up now since I just
>> ported it over to numpy and it's fresh in my mind.
> Say "no" to someone offering to make a Python module more feature rich?
> Blasphemy! :-)
> But I am very curious as to the source/basis of the performance
> improvement - did you figure this out?
Yes and no. Here's what the begining of linalg.det looks lik:
a = asarray(a)
t, result_t = _commonType(a)
a = _fastCopyAndTranspose(t, a)
n = a.shape
lapack_routine = lapack_lite.zgetrf
lapack_routine = lapack_lite.dgetrf
There's quite a few function calls here, since the actual call to dgetrf
probably takes neglible time for a 2x2 array, the cost of computation
here is dominated by function call overhead, creating the return arrays
and such not. Which is the biggest culprit, I'm not sure; by vectorizing
it, much of that overhead is only incurred once rather than hundreds or
thousands of times, so it all becomes negligible at that point. I didn't
bother to track down the worst offender.
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
More information about the Numpy-discussion