[Numpy-discussion] Looking for people interested in helping with Python compiler to LLVM
Tue Mar 13 03:19:47 CDT 2012
On Mar 13, 2012, at 12:58 AM, Dag Sverre Seljebotn wrote:
> On 03/10/2012 10:35 PM, Travis Oliphant wrote:
>> Hey all,
>> I gave a lightning talk this morning on numba which is the start of a
>> Python compiler to machine code through the LLVM tool-chain. It is proof
>> of concept stage only at this point (use it only if you are interested
>> in helping develop the code at this point). The only thing that works is
>> a fast-vectorize capability on a few functions (without for-loops). But,
>> it shows how creating functions in Python that can be used by the NumPy
>> runtime in various ways. Several NEPS that will be discussed in the
>> coming months will use this concept.
>> Right now there is very little design documentation, but I will be
>> adding some in the days ahead, especially if I get people who are
>> interested in collaborating on the project. I did talk to Fijal and Alex
>> of the PyPy project at PyCon and they both graciously suggested that I
>> look at some of the PyPy code which walks the byte-code and does
>> translation to their intermediate representation for inspiration.
>> Again, the code is not ready for use, it is only proof of concept, but I
>> would like to get feedback and help especially from people who might
>> have written compilers before. The code lives at:
> Hi Travis,
> me and Mark F. has been talking today about whether some of numba and
> Cython development could overlap -- not right away, but in the sense
> that if Cython gets some features for optimization of numerical code,
> then make it easy for numba to reuse that functionality.
That would be very, very interesting.
> This may be sort of off-topic re: the above-- but part of the goal of
> this post is to figure out numba's intended scope. If there isn't an
> overlap, that's good to know in itself.
> Question 1: Did you look at Clyther and/or Copperhead? Though similar,
> they target GPUs...but at first glance they look as though they may be
> parsing Python bytecode to get their ASTs... (didn't check though)
I have looked at both projects although Clyther more in depth. Clyther is parsing bytecode to get the AST (through a sub-project by the same author called Meta: http://srossross.github.com/Meta/html/index.html).
> Question 2: What kind of performance are you targeting -- in the short
> term, and in the long term? Is competing with "Fortran-level"
> performance a goal at all?
In the short-term, I'm targeting C-equivalent performance (like weave). In the long-term, I'm targeting optimized high-level expressions (i.e. Fortran-level) with GPU and mulit-core.
> E.g., for ufunc computations with different iteration orders such
> as "a + b.T" (a and b in C-order), one must do blocking to get good
> performance. And when dealing with strided arrays, copying small chunks
> at the time will sometimes help performance (and sometimes not).
> This is optimization strategies which (as I understand it) is quite
> beyond what NumPy iterators etc. can provide.
> And the LLVM level could
> be too low -- one has quite a lot of information when generating the
> ufunc/reduction/etc. that would be thrown away when generating LLVM
It doesn't need to be thrown away at all. It could be used to generate appropriate code for the arrays being used. The long-term idea is to actually be aware of NumPy arrays and encourage expression of high-level constructs which generate optimized code using chunking, blocking, AVX instructions, multiple threads, etc.
To do this, it may make more sense to actually emit OpenMP (unless LLVM grows standard threading intrinsics). This is not out of the question.
> Vectorizing compilers do their best to reconstruct this
> information; I know nothing about what actually exists here for
> LLVM. They are certainly a lot more complicated to implement and work
> with than making use of on higher-level information available before
> code generation.
> The idea we've been playing with is for Cython to define a limited
> subset of its syntax tree (essentially the "GIL-less" subset) seperate
> from the rest of Cython, with a more well-defined API for optimization
> passes etc., and targeted for a numerical optimization pipeline.
> This subset would actually be pretty close to what numba needs to
> compile, even if the overlap isn't perfect. So such a pipeline could
> possibly be shared between Cython and numba, even if Cython would use
> it at compile-time and numba at runtime, and even if the code
> generation backend is different (the code generation backend is
> probably not the hard part...). To be concrete, the idea is:
> (Cython|numba) -> high-level numerical compiler and
> loop-structure/blocking optimizer (by us on a shared parse tree
> representation) -> (LLVM/C/OpenCL) -> low-level optimization (by the
> respective compilers)
> Some algorithms that could be shareable are iteration strategies
> (already in NumPy though), blocking strategies, etc.
> Even if this may be beyond numba's (and perhaps Cython's) current
> ambition, it may be worth thinking about, if nothing else then just
> for how Cython's code should be structured.
This kind of collaboration would be very nice. I agree, there might be some kind of intermediate representation that would be good for both projects.
> (Mark F., how does the above match how you feel about this?)
> NumPy-Discussion mailing list
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion