[Numpy-discussion] Cython numerical syntax revisited

Francesc Alted faltet@pytables....
Thu Mar 5 03:24:51 CST 2009

A Thursday 05 March 2009, Dag Sverre Seljebotn escrigué:
> > At first sight, having a kind of Numexpr kernel inside Cython would
> > be great, but provided that you can already call Numexpr from both
> > Python/Cython, I wonder which would be the advantage to do so.  As
> > I see it, it would be better to have:
> >
> > c = numexpr.evaluate("a + b")
> >
> > in the middle of Cython code than just:
> >
> > c = a + b
> >
> > in the sense that the former would allow the programmer to see
> > whether Numexpr is called explicitely or not.
> The former would need to invoke the parser etc., which one would
> *not* need to do when one has the Cython compilation step.

Ah, yes.  That's a good point.

> When I 
> mention numexpr it is simply because there's gone work in it already
> to optimize these things; that experience could hopefully be kept,
> while discarding the parser and opcode system.
> I know too little about these things, but look:
> Cython can relatively easily transform things like
> cdef int[:,:] a = ..., b = ...
> c = a + b * b
> into a double for-loop with c[i,j] = a[i,j] + b[i,j] * b[i,j] at its
> core. A little more work could have it iterate the smallest dimension
> innermost dynamically (in strided mode).
> If a and b are declared as contiguous arrays and "restrict", I
> suppose the C compiler could do the most efficient thing in a lot of
> cases? (I.e. "cdef restrict int[:,:,"c"]" or similar)


> However if one has a strided array, numexpr could still give an
> advantage over such a loop. Or?

Well, I suppose that, provided that Cython could perform the for-loop 
transformation, giving support for strided arrays would be relatively 
trivial, and the performance would be similar than numexpr in this 

The case for unaligned arrays would a bit different, as the next trick 
is used: whenever an unaligned array is detected, a new 'copy' opcode 
is issued so that, for each data block, a copy is done in order to make 
the data aligned.  As the block sizes are chosen to fit easily in CPU's 
level-1 cache, this copy operation is done very fast and impacts rather 
little on performance.

As I see it, this would be the only situation that would be more 
complicated to implement natively in Cython because it requires 
non-trivial code for both blocking and handle opcodes.  However, for 
most of situations, my guess is that unaligned array operands do not 
appear, so perhaps the unaligned case optimization would not be so 
important for implementing it Cython.

> But anyway, this is easily one year ahead of us, unless more
> numerical Cython developers show up.


Francesc Alted

More information about the Numpy-discussion mailing list