[Numpy-discussion] Cython numerical syntax revisited
Thu Mar 5 03:24:51 CST 2009
A Thursday 05 March 2009, Dag Sverre Seljebotn escrigué:
> > At first sight, having a kind of Numexpr kernel inside Cython would
> > be great, but provided that you can already call Numexpr from both
> > Python/Cython, I wonder which would be the advantage to do so. As
> > I see it, it would be better to have:
> > c = numexpr.evaluate("a + b")
> > in the middle of Cython code than just:
> > c = a + b
> > in the sense that the former would allow the programmer to see
> > whether Numexpr is called explicitely or not.
> The former would need to invoke the parser etc., which one would
> *not* need to do when one has the Cython compilation step.
Ah, yes. That's a good point.
> When I
> mention numexpr it is simply because there's gone work in it already
> to optimize these things; that experience could hopefully be kept,
> while discarding the parser and opcode system.
> I know too little about these things, but look:
> Cython can relatively easily transform things like
> cdef int[:,:] a = ..., b = ...
> c = a + b * b
> into a double for-loop with c[i,j] = a[i,j] + b[i,j] * b[i,j] at its
> core. A little more work could have it iterate the smallest dimension
> innermost dynamically (in strided mode).
> If a and b are declared as contiguous arrays and "restrict", I
> suppose the C compiler could do the most efficient thing in a lot of
> cases? (I.e. "cdef restrict int[:,:,"c"]" or similar)
> However if one has a strided array, numexpr could still give an
> advantage over such a loop. Or?
Well, I suppose that, provided that Cython could perform the for-loop
transformation, giving support for strided arrays would be relatively
trivial, and the performance would be similar than numexpr in this
The case for unaligned arrays would a bit different, as the next trick
is used: whenever an unaligned array is detected, a new 'copy' opcode
is issued so that, for each data block, a copy is done in order to make
the data aligned. As the block sizes are chosen to fit easily in CPU's
level-1 cache, this copy operation is done very fast and impacts rather
little on performance.
As I see it, this would be the only situation that would be more
complicated to implement natively in Cython because it requires
non-trivial code for both blocking and handle opcodes. However, for
most of situations, my guess is that unaligned array operands do not
appear, so perhaps the unaligned case optimization would not be so
important for implementing it Cython.
> But anyway, this is easily one year ahead of us, unless more
> numerical Cython developers show up.
More information about the Numpy-discussion