[Numpy-discussion] Looking for people interested in helping with Python compiler to LLVM
Tue Mar 13 12:18:19 CDT 2012
>> (Mark F., how does the above match how you feel about this?)
> I would like collaboration, but from a technical perspective I think
> this would be much more involved than just dumping the AST to an IR
> and generating some code from there. For vector expressions I think
> sharing code would be more feasible than arbitrary (parallel) loops,
> etc. Cython as a compiler can make many decisions that a Python
> (bytecode) compiler can't make (at least without annotations and a
> well-defined subset of the language (not so much the syntax as the
> semantics)). I think in numba, if parallelism is to be supported, you
> will want a prange-like construct, as proving independence between
> iterations can be very hard to near impossible for a compiler.
I completely agree that you have to define some kind of syntax to get parallelism. But, a prange construct would not be out of the question, of course.
> As for code generation, I'm not sure how llvm would do things like
> slicing arrays, reshaping, resizing etc (for vector expressions you
> can first evaluate all slicing and indexing operations and then
> compile the remaining vector expression), but for loops and array
> reassignment within loops this would have to invoke the actual slicing
> code from the llvm code (I presume).
There could be some analysis on the byte-code, prior to emitting the llvm code in order to handle lots of things. Basically, you have to "play" the byte-code on a simple machine anyway in order to emit the correct code. The big thing about Cython is you have to typedef too many things that are really quite knowable from the code. If Cython could improve it's type inference, then it would be a more suitable target.
> There are many other things, like
> bounds checking, wraparound, etc, that are all supported in both numpy
> and Cython, but going through an llvm layer would as far as I can see,
> require re-implementing those, at least if you want top-notch
> performance. Personally, I think for non-trivial performance-critical
> code (for loops with indexing, slicing, function calls, etc) Cython is
> a better target.
With libclang it is really quite possible to imagine a cython -> C target that itself compiles to llvm so that you can do everything at that intermediate layer. However, LLVM is a much better layer for optimization than C now that there are a lot of people collaborating on that layer. I think it would be great if Cython targeted LLVM actually instead of C.
> Finally, as for non-vector-expression code, I really believe Cython is
> a better target. cython.inline can have high overhead (at least the
> first time it has to compile), but with better (numpy-aware) type
> inference or profile guided optimizations (see recent threads on the
> cython-dev mailing list), in addition to things like prange, I
> personally believe Cython targets most of the use cases where numba
> would be able to generate performing code.
Cython and Numba certainly overlap. However, Cython requires:
1) learning another language
2) creating an extension module --- loading bit-code files and dynamically executing (even on a different machine from the one that initially created them) can be a powerful alternative for run-time compilation and distribution of code.
These aren't show-stoppers obviously. But, I think some users would prefer an even simpler approach to getting fast-code than Cython (which currently doesn't do enought type-inference and requires building a dlopen extension module).
>> NumPy-Discussion mailing list
>> NumPy-Discussion mailing list
> NumPy-Discussion mailing list
More information about the NumPy-Discussion