Overview - Using and porting GNU lightning

Next: Using GNU lightning, Previous: Top, Up: Top

1 Introduction to gnu lightning

Dynamic code generation is the generation of machine code at runtime. It is typically used to strip a layer of interpretation by allowing compilation to occur at runtime. One of the most well-known applications of dynamic code generation is perhaps that of interpreters that compile source code to an intermediate bytecode form, which is then recompiled to machine code at run-time: this approach effectively combines the portability of bytecode representations with the speed of machine code. Another common application of dynamic code generation is in the field of hardware simulators and binary emulators, which can use the same techniques to translate simulated instructions to the instructions of the underlying machine.

Yet other applications come to mind: for example, windowing bitblt operations, matrix manipulations, and network packet filters. Albeit very powerful and relatively well known within the compiler community, dynamic code generation techniques are rarely exploited to their full potential and, with the exception of the two applications described above, have remained curiosities because of their portability and functionality barriers: binary instructions are generated, so programs using dynamic code generation must be retargeted for each machine; in addition, coding a run-time code generator is a tedious and error-prone task more than a difficult one.

This manual describes the gnu lightning dynamic code generation library. gnu lightning provides a portable, fast and easily retargetable dynamic code generation system.

To be fast, gnu lightning emits machine code without first creating intermediate data structures such as RTL representations traditionally used by optimizing compilers (see RTL representation). gnu lightning translates code directly from a machine independent interface to that of the underlying architecture. This makes code generation more efficient, since no intermediate data structures have to be constructed and consumed. A collateral benefit it that gnu lightning consumes little space: other than the memory needed to store generated instructions and data structures such as parse trees, the only data structure that client will usually need is an array of pointers to labels and unresolved jumps, which you can often allocate directly on the system stack.

To be portable, gnu lightning abstracts over current architectures' quirks and unorthogonalities. The interface that it exposes to is that of a standardized RISC architecture loosely based on the SPARC and MIPS chips. There are a few general-purpose registers (six, not including those used to receive and pass parameters between subroutines), and arithmetic operations involve three operands—either three registers or two registers and an arbitrarily sized immediate value.

On one hand, this architecture is general enough that it is possible to generate pretty efficient code even on CISC architectures such as the Intel x86 or the Motorola 68k families. On the other hand, it matches real architectures closely enough that, most of the time, the compiler's constant folding pass ends up generating code which assembles machine instructions without further tests.

1.1 Drawbacks

gnu lightning has been useful in practice; however, it does have at least four drawbacks: it has limited registers, no peephole optimizer, no instruction scheduler and no symbolic debugger. Of these, the last is the most critical even though it does not affect the quality of generated code: the only way to debug code generated at run-time is to step through it at the level of host specific machine code. A decent knowledge of the underlying instruction set is thus needed to make sense of the debugger's output.

The low number of available registers (six) is also an important limitation. However, let's take the primary application of dynamic code generation, that is, bytecode translators. The underlying virtual machines tend to have very few general purpose registers (usually 0 to 2) and the translators seldom rely on sophisticated graph-coloring algorithms to allocate registers to temporary variables. Rather, these translators usually obtain performance increases because: a) they remove indirect jumps, which are usually poorly predicted, and thus often form a bottleneck, b) they parameterize the generated code and go through the process of decoding the bytecodes just once. So, their usage of registers is rather sparse—in fact, in practice, six registers were found to be enough for most purposes.

The lack of a peephole optimizer is most important on machines where a single instruction can map to multiple native instructions. For instance, Intel chips' division instruction hard-codes the dividend to be in EAX and the quotient and remainder to be output, respectively, in EAX and EDX: on such chips, gnu lightning does lots of pushing and popping of EAX and EDX to save those registers that are not used. Unnecessary stack operations could be removed by looking at whether preserved registers are destroyed soon. Unfortunately, the current implementation of gnu lightning is so fast because it only knows about the single instruction that is being generated; performing these optimizations would require a flow analysis pass that would probably hinder gnu lightning's speed.

The lack of an instruction scheduler is not very important—pretty good instruction scheduling can actually be obtained by separating register writes from register reads. The only architectures on which a scheduler would be useful are those on which arithmetic instructions have two operands; an example is, again, the x86, on which the single instruction

         subr_i  R0, R1, R2       !Compute R0 = R1 - R2

is translated to two instruction, of which the second depends on the result of the first:

         movl    %ebx, %eax       ! Move R1 into R0
         subl    %edx, %eax       ! Subtract R2 from R0