[Numpy-discussion] numexpr thoughts

Tim Hochberg tim.hochberg at cox.net
Mon Mar 6 19:24:01 CST 2006

David M. Cooke wrote:

>1. David already mentioned opcodes to copy from an integer array into
>> a float register. Another idea would be to have an opcode to copy from
>> a strided array into a register. This would save a copy for
>> noncontiguous matrices and opens up the door to doing broadcasting. I
>> think we'd have to tweak the main interpreter loop a bit, but I think
>> it's doable. Depending how crunched we feel for opcode space, the
>> casting array and this array could overlap (OP_CAST_AND_COPY for
>> instance).
>Hadn't thought about doing strided arrays like that; it should work.
This opens another interesting can of worms: two stage compilation. If 
we aren't copying the arrays on the way in, it might make sense to have 
an abstract opcode OP_COPY_ANY, that copied data from any allowable 
source. This would be produced during the compilation stage. When 
numexpr was called, before the loop was executed, OP_COPY_ANY would be 
replaced depending on the input value. For example:

OP_COPY_ANY destreg input

would translate to:

OP_COPY_INT16ARR destreg input  # input is an int array
OP_COPY_FLT32SCL destreg input  # input is an FP scalar

The data would then be copied from the array into a register with 
appropriate striding and casting. If 'input' was already a contiguous 
double array, then it could translate into simply setting a pointer as 
is done now (OP_COPY_SIMPLE perhaps).


>Right now, here's my thoughts on where to go:
>1. I'm rewriting the compiler to be easier to play with. Top things
>   on the list are better register allocation, and common
>   subexpression elimination.

>2. Replace individual constants with "constant arrays": repeations of
>   the constant in the vector registers.
I'm already using this trick to a certain extent to reduce the number op 
func_xy opcodes. I copy the constants into arrays using OP_COPY_C.  This 
sounds like essentially the same thing as what you are doing.

>3. Reduction. I figure this could be done at the end of the program in
>   each loop: sum or multiply the output register. Downcasting the
>   output could be done here too.
Do you mean that sum() and friends could only occur as the outermost 
function. That is:
would work, but:
   "where(a, sum(a+3*b), c)"
would not? Or am I misunderstanding you here? I don't think I like that 
limitation if that's the case. I don' think it should be necessary either.

>4. Add complex numbers. If it doesn't look really messy, we could add
>   them to the current machine. Otherwise, we could make a machine that
>   works on complex numbers only.
It doesn't seem like it should be bad at all. The slickest thing would 
be to use to adjacent float buffers for a complex buffer. That would 
make the compiler a bit more complex, but it would keep the intrepeter 
simpler as there would only be a single buffer type. All that needs to 
be supported are the basic operations (+,-,*,/,// and **); comparisons 
don't work for complex numbers anyway and all of the functions can go 
through the function pointers since they're slow anyway. The one 
exception is where, which would be mixed complex and float and should be 

The other question is integers. There would be some advantages to 
supporting mixed integer and floating point operations. This adds a 
larger crop of operators though; The basic ones as above, plus 
comparisons, plus bitwise operators.

>5. Currently, we use a big switch statement. There are ways (taken
>   from Forth) that are better: indirect and direct threading.
>   Unfortunately, it looks the easy way to do these uses GCC's
>   capability to take the address of local labels. I'll add that if I
>   can refactor the machine enough so that both variants can be
>   produced. Have a look at
>   http://www.complang.tuwien.ac.at/anton/vmgen/
>   which is the virtual machine generator used for gforth (but
>   applicable to other things). I may use this.
Better as in lower overhead? Or better as in simpler to write? The 
current code is competitive with weave already which would seem to 
indicate that we're already dominated by the overhead of the FP math, 
not the big switch statement. I'd only rewrite things if it's going to 
be simpler to work with.

>6. Replace register numbers in the program with actual pointers to the
>   correct register. That should remove a layer of indirection.
This would be done at function calling time as well? So this is more 
two-stage compilation stuff?


More information about the Numpy-discussion mailing list