[Numpy-discussion] Back to numexpr

Tim Hochberg tim.hochberg at cox.net
Tue Jun 13 11:56:37 CDT 2006

I've finally got around to looking at numexpr again. Specifically, I'm 
looking at Francesc Altet's numexpr-0.2, with the idea of harmonizing 
the two versions. Let me go through his list of enhancements and comment 
(my comments are dedented):

    - Addition of a boolean type. This allows better array copying times
    for large arrays (lightweight computations ara typically bounded by
    memory bandwidth).

Adding this to numexpr looks like a no brainer. Behaviour of booleans 
are different than integers, so in addition to being more memory 
efficient, this enables boolean &, |, ~, etc to work properly.

    - Enhanced performance for strided and unaligned data, specially for
    lightweigth computations (e.g. 'a>10'). With this and the addition of
    the boolean type, we can get up to 2x better times than previous
    versions. Also, most of the supported computations goes faster than
    with numpy or numarray, even the simplest one.

Francesc, if you're out there, can you briefly describe what this 
support consists of? It's been long enough since I was messing with this 
that it's going to take me a while to untangle NumExpr_run, where I 
expect it's lurking, so any hints would be appreciated.

    - Addition of ~, & and | operators (a la numarray.where)

Sounds good.

    - Support for both numpy and numarray (use the flag --force-numarray
    in setup.py).

At first glance this looks like it doesn't make things to messy, so I'm 
in favor of incorporating this.

    - Added a new benchmark for testing boolean expressions and
    strided/unaligned arrays: boolean_timing.py

Benchmarks are always good.

    Things that I want to address in the future:

    - Add tests on strided and unaligned data (currently only tested

Yep! Tests are good.

    - Add types for int16, int64 (in 32-bit platforms), float32,
      complex64 (simple prec.)

I have some specific ideas about how this should be accomplished. 
Basically, I don't think we want to support every type in the same way, 
since this is going to make the case statement blow up to an enormous 
size. This may slow things down and at a minimum it will make things 
less comprehensible. My thinking is that we only add casts for the extra 
types and do the computations at high precision. Thus adding two int16 
numbers compiles to two OP_CAST_Ffs followed by an OP_ADD_FFF, and then 
a OP_CAST_fF.  The details are left as an excercise to the reader ;-). 
So, adding int16, float32, complex64 should only require the addition of 
6 casting opcodes plus appropriate modifications to the compiler.

For large arrays, this should have most of the benfits of giving each 
type it's own opcode, since the memory bandwidth is still small, while 
keeping the interpreter relatively simple.

Unfortunately, int64 doesn't fit under this scheme; is it used enough to 
matter? I hate pile a whole pile of new opcodes on for something that's 
rarely used.



More information about the Numpy-discussion mailing list