[Numpy-discussion] Looking for people interested in helping with Python compiler to LLVM
Tue Mar 20 13:24:26 CDT 2012
On Mar 20, 2012, at 12:49 PM, mark florisson wrote:
>> Cython and Numba certainly overlap. However, Cython requires:
>> 1) learning another language
>> 2) creating an extension module --- loading bit-code files and dynamically executing (even on a different machine from the one that initially created them) can be a powerful alternative for run-time compilation and distribution of code.
>> These aren't show-stoppers obviously. But, I think some users would prefer an even simpler approach to getting fast-code than Cython (which currently doesn't do enought type-inference and requires building a dlopen extension module).
> Dag and I have been discussing this at PyCon, and here is my take on
> it (at this moment :).
> Definitely, if you can avoid Cython then that is easier and more
> desirable in many ways. So perhaps we can create a third project
> called X (I'm not very creative, maybe ArrayExprOpt), that takes an
> abstract syntax tree in a rather simple form, performs code
> optimizations such as rewriting loops with array accesses to vector
> expressions, fusing vector expressions and loops, etc, and spits out a
> transformed AST containing these optimizations. If runtime information
> is given such as actual shape and stride information the
> transformations could figure out there and then whether to do things
> like collapsing, axes swapping, blocking (as in, introducing more axes
> or loops to retain discontiguous blocks in the cache), blocked memory
> copies to contiguous chunks, etc. The AST could then also say whether
> the final expressions are vectorizable. Part of this functionality is
> already in numpy's nditer, except that this would be implicit and do
> more (and hopefully with minimal overhead).
> So numba, Cython and maybe numexpr could use the functionality, simply
> by building the AST from Python and converting back (if necessary) to
> its own AST. As such, the AST optimizer would be only part of any
> (runtime) compiler's pipeline, and it should be very flexible to
> retain any information (metadata regarding actual types, control flow
> information, etc) provided by the original AST. It would not do
> control flow analysis, type inference or promotion, etc, but only deal
> with abstract types like integers, reals and arrays (C, Fortran or
> partly contiguous or strided). It would not deal with objects, but
> would allow to insert nodes like UnreorderableNode and SideEffectNode
> wrapping parts of the original AST. In short, it should be as easy as
> possible to convert from an original AST to this project's AST and
> back again afterwards.
I think this is a very interesting project, and certainly projects like numba can benefit of it. So, in order to us have an idea on what you are after, can we assume that your project (call it X) would be kind of an compiler optimizer, and then the produced, optimized code could be feed into numba for optimized LLVM code generation (that on its turn, can be run on top of CPUs or GPUs or a combination)? Is that correct?
Giving that my interpretation above is correct, it is bit more difficult to me to see how your X project could be of benefit for numexpr. In fact, I actually see this the other way round: once the optimizer has discovered the vectorization parts, then go one step further and generate code that uses numexpr automatically (call this, vectorization through numexpr). This is what you mean, or I'm missing something?
> As the project matures many optimizations may be added that deal with
> all sorts of loop restructuring and ways to efficiently utilize the
> cache as well as enable vectorization and possibly parallelism.
> Perhaps it could even generate a different AST depending on whether
> execution target the CPU or the GPU (with optionally available
> information such as cache sizes, GPU shared/local memory sizes, etc).
> Seeing that this would be a part of my master dissertation, my
> supervisor would require me to write the code, so at least until
> August I think I would have to write (at least the bulk of) this.
> Otherwise I can also make other parts of my dissertation's project
> more prominent to make up for it. Anyway, my question is, is there
> interest from at least the numba and numexpr projects (if code can be
> transformed into vector operations, it makes sense to use numexpr for
> that, I'm not sure what numba's interest is in that).
I'm definitely interested for the numexpr part. It is just that I'm still struggling to see the big picture on this. But the general idea is really appealing.
-- Francesc Alted
More information about the NumPy-Discussion