[Numpy-discussion] Looking for people interested in helping with Python compiler to LLVM

Olivier Delalleau shish@keba...
Tue Mar 20 14:49:18 CDT 2012


I doubt Theano is already as smart as you'd want it to be right now,
however the core mechanisms are there to perform graph optimizations and
move computations to GPU. It may save time to start from there instead of
starting all over from scratch. I'm not sure though, but it looks like it
would be worth considering it at least.

-=- Olivier

Le 20 mars 2012 15:40, Dag Sverre Seljebotn <d.s.seljebotn@astro.uio.no> a
écrit :

> ** We talked some about Theano. There are some differences in project
> goals which means that it makes sense to make this a seperate project:
> Cython wants to use this to generate C code up front from the Cython AST at
> compilation time; numba also has a different frontend (parsing of python
> bytecode) and a different backend (LLVM).
>
> However, it may very well be possible that Theano could be refactored so
> that the more essential algorithms working on the syntax tree could be
> pulled out and shared with cython and numba. Then the question is whether
> the core of Theano is smart enough to compete with Fortran compilers and
> support arbitraily strided inputs optimally. Otherwise one might as well
> start from scratch. I'll leave that for Mark to figure out...
>
> Dag
> --
> Sent from my Android phone with K-9 Mail. Please excuse my brevity.
>
>
> Olivier Delalleau <shish@keba.be> wrote:
>>
>> This sounds a lot like Theano, did you look into it?
>>
>> -=- Olivier
>>
>> Le 20 mars 2012 13:49, mark florisson <markflorisson88@gmail.com> a
>> écrit :
>>
>>> On 13 March 2012 18:18, Travis Oliphant <travis@continuum.io> wrote:
>>> >>>
>>> >>> (Mark F., how does the above match how you feel about this?)
>>> >>
>>> >> I would like collaboration, but from a technical perspective I think
>>> >> this would be much more involved than just dumping the AST to an IR
>>> >> and generating some code from there. For vector expressions I think
>>> >> sharing code would be more feasible than arbitrary (parallel) loops,
>>> >> etc. Cython as a compiler can make many decisions that a Python
>>> >> (bytecode) compiler can't make (at least without annotations and a
>>> >> well-defined subset of the language (not so much the syntax as the
>>> >> semantics)). I think in numba, if parallelism is to be supported, you
>>> >> will want a prange-like construct, as proving independence between
>>> >> iterations can be very hard to near impossible for a compiler.
>>> >
>>> > I completely agree that you have to define some kind of syntax to get
>>> parallelism.  But, a prange construct would not be out of the question, of
>>> course.
>>> >
>>> >>
>>> >> As for code generation, I'm not sure how llvm would do things like
>>> >> slicing arrays, reshaping, resizing etc (for vector expressions you
>>> >> can first evaluate all slicing and indexing operations and then
>>> >> compile the remaining vector expression), but for loops and array
>>> >> reassignment within loops this would have to invoke the actual slicing
>>> >> code from the llvm code (I presume).
>>> >
>>> > There could be some analysis on the byte-code, prior to emitting the
>>> llvm code in order to handle lots of things.   Basically, you have to
>>> "play" the byte-code on a simple machine anyway in order to emit the
>>> correct code.   The big thing about Cython is you have to typedef too many
>>> things that are really quite knowable from the code.   If Cython could
>>> improve it's type inference, then it would be a more suitable target.
>>> >
>>> >> There are many other things, like
>>> >> bounds checking, wraparound, etc, that are all supported in both numpy
>>> >> and Cython, but going through an llvm layer would as far as I can see,
>>> >> require re-implementing those, at least if you want top-notch
>>> >> performance. Personally, I think for non-trivial performance-critical
>>> >> code (for loops with indexing, slicing, function calls, etc) Cython is
>>> >> a better target.
>>> >
>>> > With libclang it is really quite possible to imagine a cython -> C
>>> target that itself compiles to llvm so that you can do everything at that
>>> intermediate layer.   However,  LLVM is a much better layer for
>>> optimization than C now that there are a lot of people collaborating on
>>> that layer.   I think it would be great if Cython targeted LLVM actually
>>> instead of C.
>>> >
>>> >>
>>> >> Finally, as for non-vector-expression code, I really believe Cython is
>>> >> a better target. cython.inline can have high overhead (at least the
>>> >> first time it has to compile), but with better (numpy-aware) type
>>> >> inference or profile guided optimizations (see recent threads on the
>>> >> cython-dev mailing list), in addition to things like prange, I
>>> >> personally believe Cython targets most of the use cases where numba
>>> >> would be able to generate performing code.
>>> >
>>> > Cython and Numba certainly overlap.  However, Cython requires:
>>> >
>>> >        1) learning another language
>>> >        2) creating an extension module --- loading bit-code files and
>>> dynamically executing (even on a different machine from the one that
>>> initially created them) can be a powerful alternative for run-time
>>> compilation and distribution of code.
>>> >
>>> > These aren't show-stoppers obviously.   But, I think some users would
>>> prefer an even simpler approach to getting fast-code than Cython (which
>>> currently doesn't do enought type-inference and requires building a dlopen
>>> extension module).
>>>
>>> Dag and I have been discussing this at PyCon, and here is my take on
>>> it (at this moment :).
>>>
>>> Definitely, if you can avoid Cython then that is easier and more
>>> desirable in many ways. So perhaps we can create a third project
>>> called X (I'm not very creative, maybe ArrayExprOpt), that takes an
>>> abstract syntax tree in a rather simple form, performs code
>>> optimizations such as rewriting loops with array accesses to vector
>>> expressions, fusing vector expressions and loops, etc, and spits out a
>>> transformed AST containing these optimizations. If runtime information
>>> is given such as actual shape and stride information the
>>> transformations could figure out there and then whether to do things
>>> like collapsing, axes swapping, blocking (as in, introducing more axes
>>> or loops to retain discontiguous blocks in the cache), blocked memory
>>> copies to contiguous chunks, etc. The AST could then also say whether
>>> the final expressions are vectorizable. Part of this functionality is
>>> already in numpy's nditer, except that this would be implicit and do
>>> more (and hopefully with minimal overhead).
>>>
>>> So numba, Cython and maybe numexpr could use the functionality, simply
>>> by building the AST from Python and converting back (if necessary) to
>>> its own AST. As such, the AST optimizer would be only part of any
>>> (runtime) compiler's pipeline, and it should be very flexible to
>>> retain any information (metadata regarding actual types, control flow
>>> information, etc) provided by the original AST. It would not do
>>> control flow analysis, type inference or promotion, etc, but only deal
>>> with abstract types like integers, reals and arrays (C, Fortran or
>>> partly contiguous or strided). It would not deal with objects, but
>>> would allow to insert nodes like UnreorderableNode and SideEffectNode
>>> wrapping parts of the original AST. In short, it should be as easy as
>>> possible to convert from an original AST to this project's AST and
>>> back again afterwards.
>>>
>>> As the project matures many optimizations may be added that deal with
>>> all sorts of loop restructuring and ways to efficiently utilize the
>>> cache as well as enable vectorization and possibly parallelism.
>>> Perhaps it could even generate a different AST depending on whether
>>> execution target the CPU or the GPU (with optionally available
>>> information such as cache sizes, GPU shared/local memory sizes, etc).
>>>
>>> Seeing that this would be a part of my master dissertation, my
>>> supervisor would require me to write the code, so at least until
>>> August I think I would have to write (at least the bulk of) this.
>>> Otherwise I can also make other parts of my dissertation's project
>>> more prominent to make up for it. Anyway, my question is, is there
>>> interest from at least the numba and numexpr projects (if code can be
>>> transformed into vector operations, it makes sense to use numexpr for
>>> that, I'm not sure what numba's interest is in that).
>>>
>>> > -Travis
>>> >
>>> >
>>> >
>>> >
>>> >>
>>> >>> Dag
>>> >>> _______________________________________________
>>> >>> NumPy-Discussion mailing list
>>> >>> NumPy-Discussion@scipy.org
>>> >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>> >>>
>>> >>>
>>> >>>
>>> >>> _______________________________________________
>>> >>> NumPy-Discussion mailing list
>>> >>> NumPy-Discussion@scipy.org
>>> >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>> >>>
>>> >> _______________________________________________
>>> >> NumPy-Discussion mailing list
>>> >> NumPy-Discussion@scipy.org
>>> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>> >
>>> > _______________________________________________
>>> > NumPy-Discussion mailing list
>>> > NumPy-Discussion@scipy.org
>>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20120320/cd2aeba8/attachment-0001.html 


More information about the NumPy-Discussion mailing list