# [SciPy-dev] Re: [SciPy-user] Low-level code integration discussion at scipy'03?

eric jones eric at enthought.com
Wed Sep 3 13:16:11 CDT 2003

```Hey Guys,

I'm interested in the conversation also.  We can set up a BOF session on
Thursday night for this.  Fernando, would you be willing to chair it?
There will be a lot of others available to chime in I'm sure.  I don't
have an "official" time yet, but my guess would be that it will start
around 8:00 to 8:30 in the evening and run until people are tired of
talking.

Michael:
Will it be any problem to get a couple of meeting rooms that evening?

thanks,
eric

Fernando Perez wrote:

> Prabhu Ramachandran wrote:
>
>>  1. I guess that weave code will usually be used to optimize the
>>     slowest loops.  Typically eliminating 'for/while' loops.  The
>>     trivial (and ancient) Laplacian examples on the SciPy pages
>>     demonstrate one such case.  My guess is most other uses would be
>>     similar.  In such cases merely having slices working is not good
>>     enough.  You might also need to do something non-trivial to the
>>     array.  At that point I'm not sure how much the niceties of blitz
>>     would be used (efficiently)?  Could you provide a more
>>     illustrative example of what blitz does that could not be done by
>>     a simple C++ loop or something?
>
>
> It's not a matter of being impossible in loops, just that the blitz
> syntax is incredibly compact.  Consider computing Numeric's
> innerproduct() operation for a rank-2 array M and a rank-6 one T,
> storing the result in U.  In blitz, the code reads:
>
> // 6d version
> void mat_ten_inner(Array<double,2>& M,
>         Array<double,6>& T, Array<double,6>& U ) {
>     firstIndex i1;
>     secondIndex i2;
>     thirdIndex i3;
>     fourthIndex i4;
>     fifthIndex i5;
>     sixthIndex i6;
>     seventhIndex j;
>
>     U = sum(M(i1,j)*T(i2,i3,i4,i5,i6,j),j);
> }
>
> That's it.  ONE line of actual code!  A version of this done with hand
> loops requires obviously 7 nested for loops.  Since I needed similar
> code for dimensions 1..6, I wrote preprocessor macros to generate all
> versions from  a 'template' (meaning, another macro, not a C++
> template<>).  This is, of course, ugly.
>
> Having said that, the above one-liner performs about 3 times slower
> than the loop macro version for d=6.  For d=1 they are about even, the
> blitz one-liner gets worse as d increases.  I posted to the blitz list
>
> So, in the end, I'm writing my own loops too :)  This is just to point
> that I'm ok with writing loops too, even though I'd love blitz to
> handle things like the above better.
>
> What blitz DOES buy you is A(i,j,k,l...) type indexing of high-rank
> arrays, which is not trivial with the C Numeric API.  And that is
> important.  Writing loops isn't bad, but all the manual pointer
> arithmetic currently necessary with the existing APIs is rather annoying.
>
> It's a matter of balancing abstraction with performance.  Blitz
> provides a lot of abstraction, but performance isn't very good.  The
> subset of blitz which gives just indexing (you do the rest yourself)
> seems to maintain pretty good performance in many cases (though I need
> to do more precise benchmarks against fortran).
>
>>  3. If plain old inlined C code does indeed perform better than the
>>     blitz equivalents, it sure would be useful to have the cleaner API
>>     anyway.  Afterall performance is a key issue here.  Having a
>>     cleaner API (without fancy expression templates but without having
>>     to convert between C++ and Python data) would really be useful in
>>     any case.
>
>
> This, we agree 100% on.  In fact, if the C API offered a way to index
> high rank arrays without manually computing stride offsets on every
> access, that would probably satisfy a significant fraction of usage
> cases.  I consider one-liners like the above icing on the cake, but
> A(i,j,k) indexing is really a significant change which makes
> development far more productive.
>
>> Well, I'm sure I would not mind writing this kind of loop (taken from
>> here http://www.scipy.org/site_content/weave/python_performance.html):
>>
>>         double tmp, err, diff;
>>         err = 0.0;
>>         for (int i=1; i<nx-1; ++i) {
>>             for (int j=1; j<ny-1; ++j) {
>>                 tmp = u(i,j);
>>                 u(i,j) = ((u(i-1,j) + u(i+1,j))*dy2 +
>>                           (u(i,j-1) + u(i,j+1))*dx2)*dnr_inv;
>>                 diff = u(i,j) - tmp;
>>                 err += diff*diff;
>>             }
>>         }
>
>
> I don't either.  As I said, the main consideration for me is clean
> indexing of rank2+ arrays.  Everything else I take as extra credit,
> and I can live without.  For example, blitz has 'stencils' to do the
> above, and in 3d cases they do simplify writing the code a lot.  But I
> haven't tested their performance, and after seeing my innerproduct()
> example above, I'd be at least a bit skeptical.  So loops are fine
> with me.
>
> Best,
>
> f.
>
> _______________________________________________
> Scipy-dev mailing list
> Scipy-dev at scipy.net
> http://www.scipy.net/mailman/listinfo/scipy-dev

```