[Numpy-discussion] Why is weave.inline()/blitz++ code 3 times slower than innerproduct()?
fperez at colorado.edu
Thu Aug 14 16:38:01 CDT 2003
I think one of the strongest points in favor of python for scientific
computing is the ability to write low-level code, when necessary, which can
perform on-par with hand-rolled Fortran. In the past, I've been very pleased
using weave's inline() tool, which relies on blitz for manipulating Numpy
arrays with an very clean and convenient syntax.
This is important, because manipulating multidimensional Numeric arrays in C
is rather messy, and the resulting code isn't exactly an example of
readability. Blitz arrays end up looking just like regular arrays, using
(i,j,k) instead of [i][j][k] for indexing.
Recently, I needed to do an operation which turned out to be pretty much what
Numpy's innerproduct() does. I'd forgotten about innerproduct(), so I just
wrote my own using inline(). Later I saw innerproduct(), and decided to
compare the results. I'm a little worried by what I found, and I'd like to
hear some input from the experts on this problem.
I've attached all the necessary code to run my tests, in case someone is
willing to do it and take a look.
In summary, I found some things which concern me (a README is included in the
.tgz with more info):
- the blitz code, whether via inline() or a purely hand-written extension, is
~2.5 to 3 times slower than innerproduct(). Considering that this code is
specialized to a few sizes and data types, this comes as a big surprise. If
the only way to get maximum performance with Numpy arrays is to write by hand
to the full low-level api, I know that many people will shy away from python
for a certain class of projects. I truly hope I'm missing something here.
- There is a significant numerical discrepancy between the two approaches
(blitz vs numpy). In an innerproduct operation over 7000 entries, the
discrepancy is O(1e-10) (in l2 norm). This is more than I'm comfortable with,
but perhaps I'm being naive or optimistic.
I view the ability to get blitzed code which performs on par with Fortran as a
very important aspect of python's suitability for large-scale project where
every last bit of performance matters, but where one still wants to have the
ability to work with a reasonably clean syntax. I hope I'm just misusing some
tools and not faced with a fundamental limitation.
By the way, I'll come to Scipy'03 with many more questions/concerns along
these lines, and I think it would be great to have some discussions on these
issues there with the experts.
Thanks in advance.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 6059 bytes
Desc: not available
Url : http://projects.scipy.org/pipermail/numpy-discussion/attachments/20030814/8fc5c36d/attachment-0001.bin
More information about the Numpy-discussion