[Numpy-discussion] New project : Spyke python-to-C compiler

Rahul Garg garg1@ualberta...
Sun Apr 6 19:48:50 CDT 2008

Note this message has been posted to numpy-discussion and python-dev.
Sorry for the multiple posting but I thought both python devs and
numpy users will be interested. If you believe your list should not  
receive this email, let me know. Also I just wanted to introduce  
myself since I may ask doubts about Python and Numpy internals from  
time to time :)


I am a student at Univ of Alberta doing my masters in computing science.
I am writing a Python-to-C compiler as one part of my thesis.
The compiler, named Spyke, will be made available in a couple of weeks
and is geared towards scientific applications and will therefore focus  
mostly on needs of scientific app developers.

What is Spyke?
In many performance critical projects, it is often necessary to
rewrite parts of the application in C. However writing C wrappers can
be time consuming. Spyke offers an alternative approach. You add  
annotations to your  Python code as strings. These strings are  
discarded by the Python
interpreter but these are interpreted as types by Spyke compiler to
convert to C.

Example :

"int -> int"
def f(x): return 2*x

In this case the Spyke compiler will consider the string "int -> int"
as a decalration that the function accepts int as parameter and
returns int. Spyke will then generate a C function and a wrapper  
function. This
idea is directly copied from PLW (Python Language Wrapper) project.
Once Python3k arrives, much of these declarations will be moved to
function annotations and class decorators.

This way you can do all your development and debugging interactively
using the standard Python interpreter. When you need to compile to C,
you just add type annotations to places that you want to convert and
invoke spyke on the annotated module. This is different from Pyrex
because Pyrex does not accept Python code. With Spyke, your code is
100% pure python.

Spyke has basic support for functions and classes. Spyke can do very  
basic type inference for local variables in function bodies. Spyke  
also has
partial support for homogenous lists and dictionaries and fixed length tuples.
One big advantage of Spyke is that it understands at least part of  
numpy. Numpy arrays  are treated as fundamental types and Spyke knows  
what C code to
generate for slicing/indexing of numpy arrays etc. This should help a
lot in scientific applications. Note that Spyke can handle only a
subset of Python. Exceptions, iterators, generators, runtime code
generation of any kind etc is not handled. Nested functions will be  
added soon. I will definitely add some of these missing features based  
on what is actually required for real world Python codes. Currently if  
Spyke does not understand a function, it  just leaves it as Python  
code. Classes can be handled but special
methods are not currently supported. The support of classes is a
little brittle because I am trying to resolve some issues b/w old and
new style of classes.

Where is Spyke?
Spyke will be available as a binary only release in a couple of weeks.
I intend to make it open source after a few months.
Spyke is written in Python and Java and should be platform independant.
I do intend to make the source open in a few months. Right now its
undergoing very rapid development and has negligible amounts of  
documentation so the source code right now is pretty useless to anyone  
else anyway.

I need help:
However I need a bit of help. I am having a couple of problems :
a) I am finding it hard to get pure Python+NumPy testing codes. I need
more codes to test the compiler. Developing a compiler without a  
test-suite is kind of useless. If you have some pure Python codes  
which need better performance, please contact me. I guarantee that  
your codes will not be released to public without your permission but   
might be referenced in academic publications. I can also make the   
compiler available to you hopefully after 10th of April. Its kind of   
unstable currently. I will also need your help in annotating the  
provided testing codes since I probably wont know what your  
application is doing.

b) Libraries which interface with C/C++ : Many codes in SciPy for
instance have mixed language codes. Part of the code is written in
C/C++. Spyke only knows how to annotated Python codes. For C/C++
libraries wrapped into Python modules, Spyke will therefore need to
know at least 2 things :
i) The mapping of a C function name/struct etc  to Python
ii) The type information of the said C function.

There are many many ways that people interact with C code. People
either write wrappers manually, or use autogenerated wrappers using
SWIG or SIP Boost.Python etc., use Pyrex or Cython while some people
use ctypes. I dont have the time or resources to support these
multitude of methods. I considered trying to parse the C code  
implementing wrappers but its "non-trivial" to put it mildly. Parsing  
only SWIG generated code is another possibility but its still hard.  
Another approach that I am seriously considering is to support a  
subset of ctypes (with additional restriction) instead. But my  
question is : Is  ctypes good enough for most of you? Ctypes cannot  
interface with C++  code but its pure Python. However I have not seen  
too many instances of people using ctypes.

c) Strings as type declarations : Do you think I should use decorators
instead at least for function type declarations?

thanks for patiently reading this,
comments and inquiries sought.

More information about the Numpy-discussion mailing list