The instruction set - Using and porting GNU lightning

Next: GNU lightning macros, Previous: Installation, Up: Using GNU lightning

2.2 gnu lightning's instruction set

gnu lightning's instruction set was designed by deriving instructions that closely match those of most existing RISC architectures, or that can be easily syntesized if absent. Each instruction is composed of:

an operation, like sub or mul
sometimes, an register/immediate flag (r or i)
a type identifier or, occasionally, two

The second and third field are separated by an underscore; thus, examples of legal mnemonics are addr_i (integer add, with three register operands) and muli_l (long integer multiply, with two register operands and an immediate operand). Each instruction takes two or three operands; in most cases, one of them can be an immediate value instead of a register.

gnu lightning supports a full range of integer types: operands can be 1, 2 or 4 bytes long (64-bit architectures might support 8 bytes long operands), either signed or unsigned. The types are listed in the following table together with the C types they represent:

          c          signed char
          uc         unsigned char
          s          short
          us         unsigned short
          i          int
          ui         unsigned int
          l          long
          ul         unsigned long
          f          float
          d          double
          p          void *

Some of these types may not be distinct: for example, (e.g., l is equivalent to i on 32-bit machines, and p is substantially equivalent to ul).

There are at least seven integer registers, of which six are general-purpose, while the last is used to contain the stack pointer (SP). The stack pointer can be used to allocate and access local variables on the stack (which is supposed to grow downwards in memory on all architectures).

Of the general-purpose registers, at least three are guaranteed to be preserved across function calls (V0, V1 and V2) and at least three are not (R0, R1 and R2). Six registers are not very much, but this restriction was forced by the need to target CISC architectures which, like the x86, are poor of registers; anyway, backends can specify the actual number of available caller- and callee-save registers.

In addition, there is a special RET register which contains the return value. You should always remember, however, that writing this register could overwrite either a general-purpose register or an incoming parameter, depending on the architecture.

There are at least six floating-point registers, named FPR0 to FPR5. These are separate from the integer registers on all the supported architectures; on Intel architectures, the register stack is mapped to a flat register file.

The complete instruction set follows; as you can see, most non-memory operations only take integers, long integers (either signed or unsigned) and pointers as operands; this was done in order to reduce the instruction set, and because most architectures only provide word and long word operations on registers. There are instructions that allow operands to be extended to fit a larger data type, both in a signed and in an unsigned way.

Binary ALU operations

These accept three operands; the last one can be an immediate value for integer operands, or a register for all operand types. addx operations must directly follow addc, and subx must follow subc; otherwise, results are undefined.

          addr     i  ui  l  ul  p  f  d  O1 = O2 + O3
          addi     i  ui  l  ul  p        O1 = O2 + O3
          addxr    i  ui  l  ul           O1 = O2 + (O3 + carry)
          addxi    i  ui  l  ul           O1 = O2 + (O3 + carry)
          addcr    i  ui  l  ul           O1 = O2 + O3, set carry
          addci    i  ui  l  ul           O1 = O2 + O3, set carry
          subr     i  ui  l  ul  p  f  d  O1 = O2 - O3
          subi     i  ui  l  ul  p        O1 = O2 - O3
          subxr    i  ui  l  ul           O1 = O2 - (O3 + carry)
          subxi    i  ui  l  ul           O1 = O2 - (O3 + carry)
          subcr    i  ui  l  ul           O1 = O2 - O3, set carry
          subci    i  ui  l  ul           O1 = O2 - O3, set carry
          rsbr     i  ui  l  ul  p  f  d  O1 = O3 - O2
          rsbi     i  ui  l  ul  p        O1 = O3 - O2
          mulr     i  ui  l  ul     f  d  O1 = O2 * O3
          muli     i  ui  l  ul           O1 = O2 * O3
          hmulr    i  ui  l  ul           O1 = high bits of O2 * O3
          hmuli    i  ui  l  ul           O1 = high bits of O2 * O3
          divr     i  ui  l  ul     f  d  O1 = O2 / O3
          divi     i  ui  l  ul           O1 = O2 / O3
          modr     i  ui  l  ul           O1 = O2 % O3
          modi     i  ui  l  ul           O1 = O2 % O3
          andr     i  ui  l  ul           O1 = O2 & O3
          andi     i  ui  l  ul           O1 = O2 & O3
          orr      i  ui  l  ul           O1 = O2 | O3
          ori      i  ui  l  ul           O1 = O2 | O3
          xorr     i  ui  l  ul           O1 = O2 ^ O3
          xori     i  ui  l  ul           O1 = O2 ^ O3
          lshr     i  ui  l  ul           O1 = O2 << O3
          lshi     i  ui  l  ul           O1 = O2 << O3
          rshr     i  ui  l  ul           O1 = O2 >> O3¹
          rshi     i  ui  l  ul           O1 = O2 >> O3²

Unary ALU operations

These accept two operands, both of which must be registers.

          negr     i     l         f  d  O1 = -O2
          notr     i  ui l  ul           O1 = ~O2

Compare instructions

These accept three operands; again, the last can be an immediate value for integer data types. The last two operands are compared, and the first operand is set to either 0 or 1, according to whether the given condition was met or not.

The conditions given below are for the standard behavior of C, where the “unordered” comparison result is mapped to false.

          ltr      i  ui  l  ul  p  f  d  O1 = (O2 <  O3)
          lti      i  ui  l  ul  p        O1 = (O2 <  O3)
          ler      i  ui  l  ul  p  f  d  O1 = (O2 <= O3)
          lei      i  ui  l  ul  p        O1 = (O2 <= O3)
          gtr      i  ui  l  ul  p  f  d  O1 = (O2 >  O3)
          gti      i  ui  l  ul  p        O1 = (O2 >  O3)
          ger      i  ui  l  ul  p  f  d  O1 = (O2 >= O3)
          gei      i  ui  l  ul  p        O1 = (O2 >= O3)
          eqr      i  ui  l  ul  p  f  d  O1 = (O2 == O3)
          eqi      i  ui  l  ul  p        O1 = (O2 == O3)
          ner      i  ui  l  ul  p  f  d  O1 = (O2 != O3)
          nei      i  ui  l  ul  p        O1 = (O2 != O3)
          unltr                     f  d  O1 = !(O2 >= O3)
          unler                     f  d  O1 = !(O2 >  O3)
          ungtr                     f  d  O1 = !(O2 <= O3)
          unger                     f  d  O1 = !(O2 <  O3)
          uneqr                     f  d  O1 = !(O2 <  O3) && !(O2 >  O3)
          ltgtr                     f  d  O1 = !(O2 >= O3) || !(O2 <= O3)
          ordr                      f  d  O1 =  (O2 == O2) &&  (O3 == O3)
          unordr                    f  d  O1 =  (O2 != O2) ||  (O3 != O3)

Transfer operations

These accept two operands; for ext both of them must be registers, while mov accepts an immediate value as the second operand.

Unlike movr and movi, the other instructions are applied between operands of different data types, and they need two data type specifications. You can use extr to convert between integer data types, in which case the first must be smaller in size than the second; for example extr_c_ui is correct while extr_ul_us is not. You can also use extr to convert an integer to a floating point value: the only available possibilities are extr_i_f and extr_i_d. The other instructions convert a floating point value to an integer, so the possible suffixes are _f_i and _d_i.

          movr                      i  ui  l  ul  p  f  d  O1 = O2
          movi                      i  ui  l  ul  p  f  d  O1 = O2
          extr        c  uc  s  us  i  ui  l  ul     f  d  O1 = O2
          roundr                    i                f  d  O1 = round(O2)
          truncr                    i                f  d  O1 = trunc(O2)
          floorr                    i                f  d  O1 = floor(O2)
          ceilr                     i                f  d  O1 = ceil(O2)

Note that the order of the arguments is destination first, source second as for all other gnu lightning instructions, but the order of the types is always reversed with respect to that of the arguments: shorter—source—first, longer—destination—second. This happens for historical reasons.

Network extensions

These accept two operands, both of which must be registers; these two instructions actually perform the same task, yet they are assigned to two mnemonics for the sake of convenience and completeness. As usual, the first operand is the destination and the second is the source.

          hton       us ui          Host-to-network (big endian) order
          ntoh       us ui          Network-to-host order

Load operations

ld accepts two operands while ldx accepts three; in both cases, the last can be either a register or an immediate value. Values are extended (with or without sign, according to the data type specification) to fit a whole register.

          ldr     c  uc  s  us  i  ui  l  ul  p  f  d  O1 = *O2
          ldi     c  uc  s  us  i  ui  l  ul  p  f  d  O1 = *O2
          ldxr    c  uc  s  us  i  ui  l  ul  p  f  d  O1 = *(O2+O3)
          ldxi    c  uc  s  us  i  ui  l  ul  p  f  d  O1 = *(O2+O3)

Store operations

st accepts two operands while stx accepts three; in both cases, the first can be either a register or an immediate value. Values are sign-extended to fit a whole register.

          str     c  uc  s  us  i  ui  l  ul  p  f  d  *O1 = O2
          sti     c  uc  s  us  i  ui  l  ul  p  f  d  *O1 = O2
          stxr    c  uc  s  us  i  ui  l  ul  p  f  d  *(O1+O2) = O3
          stxi    c  uc  s  us  i  ui  l  ul  p  f  d  *(O1+O2) = O3

Stack management

These accept a single register parameter. These operations are not guaranteed to be efficient on all architectures.

          pushr                     i  ui  l  ul  p   push O1 on the stack
          popr                      i  ui  l  ul  p   pop O1 off the stack

Argument management

These are:

          prepare                   i                f  d
          pusharg     c  uc  s  us  i  ui  l  ul  p  f  d
          getarg      c  uc  s  us  i  ui  l  ul  p  f  d
          arg         c  uc  s  us  i  ui  l  ul  p  f  d

Of these, the first two are used by the caller, while the last two are used by the callee. A code snippet that wants to call another procedure and has to pass registers must, in order: use the prepare instruction, giving the number of arguments to be passed to the procedure (once for each data type); use pusharg to push the arguments in reverse order; and use calli or finish (explained below) to perform the actual call.

arg and getarg are used by the callee. arg is different from other instruction in that it does not actually generate any code: instead, it is a function which returns a value to be passed to getarg.³ You should call arg as soon as possible, before any function call or, more easily, right after the prolog or leaf instructions (which are treated later).

getarg accepts a register argument and a value returned by arg, and will move that argument to the register, extending it (with or without sign, according to the data type specification) to fit a whole register. These instructions are more intimately related to the usage of the gnu lightning instruction set in code that generates other code, so they will be treated more specifically in Generating code at run-time.

You should observe a few rules when using these macros. First of all, it is not allowed to call functions with more than six arguments; this was done to simplify and speed up the implementation on architectures that use registers for parameter passing.

You should not nest calls to prepare, nor call zero-argument functions (which do not need a call to prepare) inside a prepare/calli or prepare/finish block. Doing this might corrupt already pushed arguments.

You cannot pass parameters between subroutines using the six general-purpose registers. This might work only when targeting particular architectures.

On the other hand, it is possible to assume that callee-saved registers (R0 through R2) are not clobbered by another dynamically generated function which does not use them as operands in its code and which does not return a value.

Branch instructions

Like arg, these also return a value which, in this case, is to be used to compile forward branches as explained in Fibonacci numbers. They accept a pointer to the destination of the branch and two operands to be compared; of these, the last can be either a register or an immediate. They are:

          bltr      i  ui  l  ul  p  f  d  if (O2 <  O3) goto O1
          blti      i  ui  l  ul  p        if (O2 <  O3) goto O1
          bler      i  ui  l  ul  p  f  d  if (O2 <= O3) goto O1
          blei      i  ui  l  ul  p        if (O2 <= O3) goto O1
          bgtr      i  ui  l  ul  p  f  d  if (O2 >  O3) goto O1
          bgti      i  ui  l  ul  p        if (O2 >  O3) goto O1
          bger      i  ui  l  ul  p  f  d  if (O2 >= O3) goto O1
          bgei      i  ui  l  ul  p        if (O2 >= O3) goto O1
          beqr      i  ui  l  ul  p  f  d  if (O2 == O3) goto O1
          beqi      i  ui  l  ul  p        if (O2 == O3) goto O1
          bner      i  ui  l  ul  p  f  d  if (O2 != O3) goto O1
          bnei      i  ui  l  ul  p        if (O2 != O3) goto O1
          
          bunltr                     f  d  if !(O2 >= O3) goto O1
          bunler                     f  d  if !(O2 >  O3) goto O1
          bungtr                     f  d  if !(O2 <= O3) goto O1
          bunger                     f  d  if !(O2 <  O3) goto O1
          buneqr                     f  d  if !(O2 <  O3) && !(O2 >  O3) goto O1
          bltgtr                     f  d  if !(O2 >= O3) || !(O2 <= O3) goto O1
          bordr                      f  d  if  (O2 == O2) &&  (O3 == O3) goto O1
          bunordr                    f  d  if !(O2 != O2) ||  (O3 != O3) goto O1
          
          bmsr      i ui l  ul             if O2 &  O3 goto O1
          bmsi      i ui l  ul             if O2 &  O3 goto O1
          bmcr      i ui l  ul             if !(O2 & O3) goto O1
          bmci      i ui l  ul             if !(O2 & O3) goto O1⁴
          boaddr    i ui l  ul             O2 += O3, goto O1 on overflow
          boaddi    i ui l  ul             O2 += O3, goto O1 on overflow
          bosubr    i ui l  ul             O2 -= O3, goto O1 on overflow
          bosubi    i ui l  ul             O2 -= O3, goto O1 on overflow

Jump and return operations

These accept one argument except ret which has none; the difference between finish and calli is that the latter does not clean the stack from pushed parameters (if any) and the former must always follow a prepare instruction. Results are undefined when using function calls in a leaf function.

          calli     (not specified)                  function call to O1
          callr     (not specified)                  function call to a register
          finish    (not specified)                  function call to O1
          finishr   (not specified)                  function call to a register
          jmpi/jmpr (not specified)                  unconditional jump to O1
          prolog    (not specified)                  function prolog for O1 args
          leaf      (not specified)                  the same for leaf functions
          ret       (not specified)                  return from subroutine
          retval    c  uc s  us i  ui l  ul p  f  d  move return value
                                                     to register

Like branch instruction, jmpi also returns a value which is to be used to compile forward branches. See Fibonacci numbers.

As a small appetizer, here is a small function that adds 1 to the input parameter (an int). I'm using an assembly-like syntax here which is a bit different from the one used when writing real subroutines with gnu lightning; the real syntax will be introduced in See Generating code at run-time.

     incr:
          leaf      1
     in = arg_i                   ! We have an integer argument
          getarg_i  R0, in        ! Move it to R0
          addi_i    RET, R0, 1    ! Add 1, put result in return value
          ret                     ! And return the result

And here is another function which uses the printf function from the standard C library to write a number in hexadecimal notation:

     printhex:
          prolog    1
     in = arg_i                    ! Same as above
          getarg_i  R0, in
          prepare   2              ! Begin call sequence for printf
          pusharg_i R0             ! Push second argument
          pusharg_p "%x"           ! Push format string
          finish    printf         ! Call printf
          ret                      ! Return to caller

Footnotes

[1] The sign bit is propagated for signed types.

[2] The sign bit is propagated for signed types.

[3] “Return a value” means that gnu lightning macros that compile these instructions return a value when expanded.

[4] These mnemonics mean, respectively, branch if mask set and branch if mask cleared.