Issues in the Design of a Code Generator

 1 Input to the Code Generator

2 The Target Program

3 Instruction Selection

4 Register Allocation

5 Evaluation Order

 While the details are dependent on the specifics of the intermediate represen-tation, the target language, and the run-time system, tasks such as instruction selection, register allocation and assignment, and instruction ordering are en-countered in the design of almost all code generators.

 The most important criterion for a code generator is that it produce cor-rect code. Correctness takes on special significance because of the number of special cases that a code generator might face. Given the premium on correct-ness, designing a code generator so it can be easily implemented, tested, and maintained is an important design goal.

 1. Input to the Code Generator

 The input to the code generator is the intermediate representation of the source program produced by the front end, along with information in the symbol table that is used to determine the run-time addresses of the data objects denoted by the names in the IR.

 The many choices for the IR include three-address representations such as quadruples, triples, indirect triples; virtual machine representations such as bytecodes and stack-machine code; linear representations such as postfix no-tation; and graphical representations such as syntax trees and DAG's. Many of the algorithms in this chapter are couched in terms of the representations considered in Chapter 6: three-address code, trees, and DAG's. The techniques we discuss can be applied, however, to the other intermediate representations as well. 

 In this chapter,  we assume that the front  end  has  scanned,  parsed,  and translated the source program into a relatively low-level IR, so that the values of the names appearing in the IR can be represented by quantities that the target machine can directly manipulate, such as integers and floating-point numbers. We also assume that all syntactic and static semantic errors have been detected, that the necessary type checking has taken place, and that type-conversion operators have been inserted wherever necessary. The code generator can therefore proceed on the assumption that its input is free of these kinds of errors.

 2. The Target Program 

The instruction-set  architecture  of the  target machine  has  a significant  impact on  the  difficulty  of constructing  a  good code  generator  that  produces high-quality machine code. The most common target-machine architectures are RISC (reduced instruction set computer), CISC (complex instruction set computer), and stack based.

A RISC machine typically has many registers, three-address instructions, simple addressing modes, and a relatively simple instruction-set architecture. In contrast, a CISC machine typically has few registers, two-address instruc-tions, a variety of addressing modes, several register classes, variable-length instructions, and instructions with side effects.

 

In a stack-based machine, operations are done by pushing operands onto a stack and then performing the operations on the operands at the top of the stack. To achieve high performance the top of the stack is typically kept in registers. Stack-based machines almost disappeared because it was felt that the stack organization was too limiting and required too many swap and copy operations.

 However, stack-based architectures were revived with the introduction of the Java Virtual Machine (JVM). The JVM is a software interpreter for Java bytecodes, an intermediate language produced by Java compilers. The inter- preter provides software compatibility across multiple platforms, a major factor in the success of Java.

 To overcome the high performance penalty of interpretation, which can be on the order of a factor of 10, just-in-time (JIT) Java compilers have been created. These JIT compilers translate bytecodes during run time to the native hardware instruction set of the target machine. Another approach to improving Java performance is to build a compiler that compiles directly into the machine instructions of the target machine, bypassing the Java bytecodes entirely.

 Producing an absolute machine-language program as output has the ad-vantage that it can be placed in a fixed location in memory and immediately executed. Programs can be compiled and executed quickly.

 

Producing a relocatable machine-language program (often called an object module) as output allows subprograms to be compiled separately. A set of relocatable object modules can be linked together and loaded for execution by a linking loader. Although we must pay the added expense of linking and loading if we produce relocatable object modules, we gain a great deal of flexibility in being able to compile subroutines separately and to call other previously compiled programs from an object module. If the target machine does not handle relocation automatically, the compiler must provide explicit relocation information to the loader to link the separately compiled program modules.

 

Producing an assembly-language program as output makes the process of code generation somewhat easier. We can generate symbolic instructions and use the macro facilities of the assembler to help generate code. The price paid is the assembly step after code generation.

 In this chapter, we shall use a very simple RISC-like computer as our target machine. We add to it some CISC-like addressing modes so that we can also discuss code-generation techniques for CISC machines. For readability, we use assembly code as the target language . As long as addresses can be calculated from offsets and other information stored in the symbol table, the code gener-ator can produce relocatable or absolute addresses for names just as easily as symbolic addresses.

 3. Instruction Selection

 The code generator must map the IR program into a code sequence that can be executed by the target machine. The complexity of performing this mapping is determined by a factors such as 

•  the level of the IR

 •  the nature of the instruction-set architecture

 •  the desired quality of the generated code.

 If the IR is high level, the code generator may translate each IR statement into a sequence of machine instructions using code templates. Such statement-by-statement code generation, however, often produces poor code that needs further optimization. If the IR reflects some of the low-level details of the un-derlying machine, then the code generator can use this information to generate

 more efficient code sequences.

 The nature of the instruction set of the target machine has a strong effect on the difficulty of instruction selection. For example, the uniformity and com-pleteness of the instruction set are important factors. If the target machine does not support each data type in a uniform manner, then each exception to the general rule requires special handling. On some machines, for example, floating-point operations are done using separate registers.

 

Instruction speeds and machine idioms are other important factors. If we do not care about the efficiency of the target program, instruction selection is straightforward. For each type of three-address statement, we can design a code skeleton that defines the target code to be generated for that construct. For example, every three-address statement of the form x = y + z, where x, y, and z are statically allocated, can be translated into       the code sequence


Here, the fourth statement is redundant since it loads a value that has just been stored, and so is the third if a is not subsequently used.

 The quality of the generated code is usually determined by its speed and size. On most machines, a given IR program can be implemented by many different code sequences, with significant cost differences between the different implementations. A naive translation of the intermediate code may therefore lead to correct but unacceptably inefficient target code.

 

For example, if the target machine has an "increment" instruction (INC), then the three-address statement a = a + 1 may be implemented more efficiently by the single instruction INC a, rather than by a more obvious sequence that loads a into a register, adds one to the register, and then stores the result back into a:


We need to know instruction costs in order to design good code sequences but, unfortunately, accurate cost information is often difficult to obtain. De-ciding which machine-code sequence is best for a given three-address construct may also require knowledge about the context in which that construct appears.

 

In Section 8.9 we shall see that instruction selection can be modeled as a tree-pattern matching process in which we represent the IR and the machine instructions as trees. We then attempt to "tile" an IR tree with a set of sub-trees that correspond to machine instructions. If we associate a cost with each machine-instruction subtree, we can use dynamic programming to generate op-timal code sequences. Dynamic programming is discussed in Section 8.11.

 4. Register Allocation

 A key problem in code generation is deciding what values to hold in what registers. Registers are the fastest computational unit on the target machine, but we usually do not have enough of them to hold all values. Values not held in registers need to reside in memory. Instructions involving register operands are invariably shorter and faster than those involving operands in memory, so efficient utilization of registers is particularly important.

 The use of registers is often subdivided into two subproblems:

     Register allocation, during which we select the set of variables that will reside in registers at each point in the program.

     Register assignment, during which we pick the specific register that a variable will reside in.

 Finding an optimal assignment of registers to variables is difficult, even with single-register machines. Mathematically, the problem is NP-complete. The problem is further complicated because the hardware and/or the operating system of the target machine may require that certain register-usage conventions be observed.

 E x a m p l e 8 . 1 : Certain machines require register-pairs (an even and next odd-numbered register) for some operands and results. For example, on some ma-chines, integer multiplication and integer division involve register pairs. The multiplication instruction is of the form

 M x,  y

 where x, the multiplicand, is the even register of an even/odd register pair and y, the multiplier, is the odd register. The product occupies the entire even/odd register pair. The division instruction is of the form

D x,  y

where the dividend occupies an even/odd register pair whose even register is x; the divisor is y. After division, the even register holds the remainder and the odd register the quotient.

 Now, consider the two three-address code sequences in Fig. 8.2 in which the only difference in (a) and (b) is the operator in the second statement. The shortest assembly-code sequences for     (a) and (b) are given in Fig. 8.3.

Ri stands for register i. SRDA stands for Shift-Right-Double-Arithmetic and SRDA RO, 32 shifts the dividend into Rl and clears RO so all bits equal its sign bit. L, ST, and A stand for load, store, and add, respectively. Note that the optimal choice for the register into which a is to be loaded depends on what will ultimately happen to t.

  Strategies for register allocation and assignment are discussed in Section 8.8. Section 8.10 shows that for certain classes of machines we can construct code sequences that evaluate expressions using as few registers as possible.

 

5. Evaluation Order

 The order in which computations are performed can affect the efficiency of the target code. As we shall see, some computation orders require fewer registers to hold intermediate results than others. However, picking a best order in the general case is a difficult NP-complete problem. Initially, we shall avoid the problem by generating code for the three-address statements in the order in which they have been produced by the intermediate code generator. In Chapter 10, we shall study code scheduling for pipelined machines that can execute several operations in a single clock cycle.