"Cola" - A compiler for the Parrot/Perl6 VM V0.0.6.0 I've started this compiler (I call it Cola) to simultaneously teach myself how to write a compiler (ok I admit I squeaked through Compiler Design in school because I was too busy playing online MUD and basketball), as well as help out a project that I love, Perl6/Parrot! To really start having fun with the Parrot runtime I wanted a language similar to C/C++/C#/Java. The parser is LALR, developed with flex/bison, with an intermediate code compiler written in Perl. The parser generates a simple intermediate code (more on that later) which is easier to read than Parrot assembly, and also includes a few directives that Parrot currently lacks. The parser leaves register allocation/spill control, optimization and a few other specifics to the intermediate compiler (int2pasm.pl). Where to Get the Latest Compiler Currently Cola is included in the Parrot distribution, see http://www.parrotcode.org or http://dev.perl.org/perl6 At some point the distributions may diverge for some reason so you can grab the latest tarball from CPAN. http://cpan.org/authors/id/M/ME/MELVIN The Syntax Compatible (eventually) C# syntax. The easiest way to see what it currently looks like is to read the examples. I aim to eventually achieve C# source level compliance, targetted to the Parrot runtime. Whether or not we do on the fly bytecode interfacing with .NET later is a whole different rhinoceros. Supported Constructs For some quick samples, see the cola/examples/ subdirectory. Statements: FOR, WHILE, IF, ELSE, BREAK, CONTINUE, RETURN. All of these are limited in their current version but follow typical C-ish rules (break breaks out of current loop, continue shortcuts to the next iteration). return may return a value from anywhere inside a method. 0.0.4 adds logical operators. Conditional expressions may now be compound and use logical operators (&& and ||). CAUTION: Assignments inside conditionals such as... if((i = 15) == 15) are still not supported. This sort of expression will be fixed in the next update. Types: int, float, string (classes and arrays coming soon) Currently these map directly to the Parrot primitive types. Very simple type coercement and checking is supported. Variable Declarations: For now declare all your variables at the top of each method. Block scoping will be done soon. You may initialize your variables in the declaration. Expressions: Parens, +, -, *, /, %, ++ and -- are supported in any combination or complexity. Post and pre-increment are also supported. You may use the + operator on strings for concatenation. The following works: puts("Hello Mr " + name + "\n"); There is no support for concatenation with non-strings yet. Need to whip up a few conversion routines. Comparisons: <, >, <=, >=, !=, == Bitwise operators: <<, >>, |, &, ^, and ~ are now supported in 0.0.4 Conditional expressions: Commonly known as the ternary conditional... As I understand the Java and C# language spec, conditional expressions must be on the right hand side of an assignment or anywhere that uses the return value, however you cannot use ternaries standalone as a statement. So in Cola you can say: max = i > j ? i : j; or: max = i > j ? foo() : bar(); But it is not proper (although C allows you) to say: i > j ? printf("max=i\n") : printf("max=j\n"); If you think I have misread the C# grammar spec, please email me. Boolean: The boolean ! operator isn't yet supported. Objects and Methods: Classic C++/Java style, recursion supported. Return types supported. Will be adding variable argument support per the C# spec. Member variables or "fields" aren't yet supported. You can use const definitions, but member variables are incomplete pending a little more work on Parrot. Its possible to do them now with PerlHash or PerlArray, but I'm working on something faster for strictly typed, non-dynamic languages. Currently instance methods are the same as class methods. So you could do: Console.WriteLine(""); or Console c = new Console(); c.WriteLine(""); The compiler doesn't yet differentiate between static or class methods and instance methods, and whether you are calling them as such. OOP is really just faked for now, enough for people to write in a high level language for Parrot. As everything, it is a work in progress. All objects are simply PerlString references for now. Current Builtin Subroutines Since I have yet to implement class importing, the system routines are just plain wrappers around the Parrot ops. Currently they are: strlen, substr, strchop, ord, puts, puti, putf, gets, sleep See gen.c and main() for the current kludgy way to patch in more wrappers. In calc.cola I also did a sample implementation of a string to int conversion called StrToInt(). Arrays: Parrot now has a substr with replace op so we can emulate arrays on top of it. Its still a hack but it works. What you currently CANNOT do even though the Parser may eat it... Statement lists: i = j, j = k; Nested assigns: i = (j = 0); Fancy For loops: for(i = 0; j++, j < 4; i++) Empty For headers: for(;;) // use while(1) Using the Compiler You will need Flex and Bison installed to build the compiler. These are the GNU versions of the classic lex and yacc, but are more modern. The grammars should work with standard lex/yacc but I've not tested this lately. If for some odd reason you have Perl and Parrot but no Flex/Bison send me an email and I might take pity on you and send you the generated parser C files. Build colac by typing: make Usage: colac examples/mandelbrot.cola Then copy "a.pasm" to your Parrot directory, assemble it and run as usual. In case you need help there to, try: assemble.pl a.pasm > a.pbc parrot a.pbc Currently colac is a short Perl pre-processor that includes class for any import statements (using System;) If you have trouble with colac you can just use colacc which is the raw compiler which ignores 'using' directives. Also, if you look in "a.imc" you will see an intermediate language, delimited by tags (....) with some debug info outside of those tags. Debugging the compiler is easier by looking at the intermediate code. This language can be piped through int2pasm.pl to re-generate the .pasm file. Currently the compiler is very limited with a few warnings and a few simple type coercions. If you try to do something fancy, the grammar might accept it but the code will probably come out wrong or simply crash the compiler. Read all the samples before trying anything. Intermediate Code NOTE: The translator is very simple, mostly it just pre-processes code that it understands and leaves the bulk of the work to the Parrot assembler. The intermediate code output supports 3 primary types of statements. Assignment statements, Branch statements, and Stack manipulation It additionally supports: Labels (Parrot style), Directives (.foo), Line comments (# ...), and Here documents (for emitting code that skips through the intermediate translator). The compiler uses the here document to emit a few support routines (puts, substr). I'm working on moving this stuff into System.cola. Limited type checking. Each assignment has a type associated with it, in the same syntax as a C-style cast. Example: (i32)$R12 = $N7 * 0.1 A floating point expression coerced to a integer. The compiler will check for bad assignments such as int to string, etc. For a nice sample of intermediate code, compile mandelbrot.cola or calc.cola which actually does a limited form of parsing with Parrot! Register Allocation The register tracker took about 30 minutes to hack out and is good enough for now. Temporaries from the compiler are either named $LN or $RN (where N is an integer indicating the temporary). The translator will assign registers from a register stack in "first come first serve" fashion, and upon each line translation, it will return any registers to the stack that were used for rvalues on the right hand side of an assignment. Rvalues are indicated by $RN. Named variables and $L (lvalues) will currently hold their register for the scope of the method. This is quick and dirty. I plan to implement a flavor of the graph coloring algorithm for the final translator. For now, if the translator runs out of registers, it will simply exit. Parrot has 32 registers each of string, int, float and pmc so it takes quite a number of local variables and/or quite a complex expression to hold 32 live registers at any one point. Mandelbrot currently uses 9 of 32 float registers and 6 of 32 int registers to get its work done. Optimization Very limited. No explicit optimization phase yet. You might see dumb code generated in the form of: set I1, I0 set I2, I1 or.. branch LABEL34 LABEL34: ... generated between basic blocks or in situations where the generator is currently dumb. I'll at a most add in a simple peephole optimizer to skip useless assignments, temporaries and branches. Coming Soon Some form of a printf() Full array support Gaping Holes Arrays Strings can be treated as arrays, thats it for now. Parrot still needs keyed aggregates finished to do this. Class/Struct instantiation Lotta work involved here, I won't go there for now. Currently you can define multiple Cola classes per file, but cannot instantiate any of them. Parrot currently does not have an adequate bytecode "class format" in which to write symbol table information. This should change soon; at least thats what Dan says. :) Object Methods and Field references Again, a lot of work, but on the list.