[Numpy-discussion] Numpy-discussion Digest, Vol 19, Issue 24

Rahul Garg garg1@ualberta...
Mon Apr 7 01:19:03 CDT 2008


Quoting numpy-discussion-request@scipy.org:

> What will be the licensing of this project?  Do you know yet?

I am thinking GPL for the compiler and LGPL for any runtime components  
which should be similar to GCC. Which version : Version 2 or version 3  
of the license is undecided. Will also check with uni to see if they  
have any problems (they shouldnt). Also need to check with uni for  
hosting. I believe I will need to host on uni servers.

> I have a couple of comments because I've been thinking along these lines.
>> What is Spyke?
>> In many performance critical projects, it is often necessary to
>> rewrite parts of the application in C. However writing C wrappers can
>> be time consuming. Spyke offers an alternative approach. You add
>> annotations to your  Python code as strings. These strings are
>> discarded by the Python
>> interpreter but these are interpreted as types by Spyke compiler to
>> convert to C.
>>
>> Example :
>>
>> "int -> int"
>> def f(x): return 2*x
>>
>> In this case the Spyke compiler will consider the string "int -> int"
>> as a decalration that the function accepts int as parameter and
>> returns int. Spyke will then generate a C function and a wrapper
>> function.
> What about the use of decorators in this case?

I can certainly use decorators. Will implement this change soon.

> Also, it would be great to be able to create ufuncs (and general numpy
> funcs) using this approach.   A decorator would work well here as well.
>> Where is Spyke?
>> Spyke will be available as a binary only release in a couple of weeks.
>> I intend to make it open source after a few months.
>>
> I'd like to encourage you to make it available as open source as early
> as possible.    I think you are likely to get help in ways you didn't
> expect.  People are used to reading code, so even an alpha project can
> get help early.     In fact given that you are looking for help.  I
> think this may be the best way to get it.

Ok .. I will release the source along with the binary. Need to sort  
some stuff out so might take a couple of weeks. Note that much of the  
compiler is (for better or worse) written in Java. The codebase isnt  
very OOP (full of static methods and looks more like garbage collected  
C) but not too complex either. I use cpython's "compiler" module to  
dump the AST into an intermediate file which is then parsed by the  
compiler in java. The compiler is using AST representation throughout.  
The compiler also depends upon the antlr java runtime.

For hosting, I will probably get some space at the univ servers. I  
will try to get trac installed.

I will release when all the following "work":
a) Basic support for functions and classes.

b) Keyword parameters not supported.

c) Special methods not supported except __init__.

d) __init__ is treated as constructor. Custom __new__ not supported.

e) Nested functions may be broken.

f) Functions will be divided into 2 types : static and dynamic. Static  
functions should not be redefined at runtime while dynamic functions  
can be redefined at runtime but will be more costly to call since I  
need to lookup the binding at each time. Also even though dynamic  
functions can be redefined its type signature should not change. If a  
static function calls another static function,
then the compiler will try to insert a call to the C function instead  
of wrapped function thus bypassing the interpreter if possible.

g) Compiled classes should not redefine methods at runtime. Will have  
an option to annotate classes as "final" meaning user shouldnt  
subclass it. For such classes, its easier to generate efficient code  
for attribute access.
Also compiled classes shouldnt dynamically add/delete attributes.

h) Users shouldnt subclass numpy array.

i) For method calls on objects, mostly the code generated will just  
end up making a call to interpreter thus the performance in this case  
will not be particularly good currently. For ints, floats etc the  
equivalent C code will be generated so for these types the code should  
be fast enough.

j) For indexing of numpy arrays, unsafe code is generated. I directly  
access the array without any index checking.

k) Loops : This is the weakest point currently. I only allow for-loops  
over range() or xrange() allowing easy conversion to C. Cannot loop  
over elements of other lists or numpy arrays etc.

l) Exec, eval, metaclasses, dynamic class creation, dynamic  
adding/deleting attributes etc not allowed inside typed code.

m) A module cannot currently mix typed and untyped code. A module has  
to be completely typed/annotated or it should be left alone and not  
compiled. Also a typed module cannot have arbitrary executable code  
and should only consist of single statement variable declarations,  
function and class definitions. Of course rest of your application can  
be left untyped. In the future I will try allow mixing typed and  
untyped code in a module.

n) Importing of other typed modules also mostly supported.

o) Builtin functions : range and len mostly work. But cannot guarantee  
anything else.

p) Lists, tuples and dictionaries can be used but need to be  
homogeneous. Not all methods supported yet. Moreover the code  
generated for these types mostly just generates function calls to  
python interpreter so this doesnt speed things up (yet). Not very sure  
how to handle subclasses of these and other builtin types.

q) For function parameters of user-defined-class types, you can  
declare the parameter as "final". Example type can be declared as  
"final SomeClass" meaning that you will only pass SomeClass and not  
subclass. This allows the compiler (in the future) to generate better  
code for attribute access.

Expected release date : Soon. Hopefully by 15th to 20th april.

> If you need help getting set up in terms of hosting it somewhere, I can
> help you do that.
>> Spyke is written in Python and Java and should be platform independant.
>> I do intend to make the source open in a few months. Right now its
>> undergoing very rapid development and has negligible amounts of
>> documentation so the source code right now is pretty useless to anyone
>> else anyway.
>>
>
>> c) Strings as type declarations : Do you think I should use decorators
>> instead at least for function type declarations?
>>
> I think you should use decorators.   That way you can work towards
> having the compiler "embedded" in the decorator and happen seamlessly
> without invoking a separte "program" (it just happens when the module is
> loaded -- a.l.a weave).

Well that can be done provided certain restrictions are met. One major  
problem is that it will make user applications dependent upon presence  
of a JVM since the compiler is in Java.

Secondly seeing as much code as possible at compile time helps the compiler.
For example, if you have a function G called inside function F, then  
the compiler needs to know the type of G which may not have been  
encountered yet.
Also I am trying to work my way towards a whole program compiler since  
some of the optimizations that I want to research for my thesis are  
dependent on seeing the whole program. Those havent been implemented  
yet but will be in the future.

Basically, if we call the compiler on one function at a time, the  
applicability of the compiler is reduced somewhat. So I will try to  
provide an option to invoke it at runtime too but it will have less  
features and in most cases less performance.

Also, any thoughts on interfacing with existing C code?

thanks,
rahul


More information about the Numpy-discussion mailing list