Issues in Code Generation
1. Input to the Code Generator
The input to the code generator is the intermediate representation of the source program produced by the front end of the compiler with the symbol table information. There are various choices for intermediate representations:
- Linear representation: Such as postfix representation or three-address code representation.
- Virtual machine representation: Such as stack machine code.
- Graphical representation: Such as syntax tree and DAGs.
It is assumed that before code generation, the front end of the compiler has taken care of the following issues:
- The source program is scanned, parsed, and translated to a detailed intermediate representation.
- Necessary type checking is done.
- The input is free of errors.
2. Output of Code Generator
The output of the code generator is the target code, which may take on a variety of forms:
- Absolute machine language: Can be placed in a fixed location in memory and immediately executed.
- Relocatable machine language: Allows sub-programs to be compiled separately. A set of relocatable object modules can be linked and loaded for execution by a linking loader.
- Assembly language: Easier code generation process. Symbolic instructions and the macro facility of the assembler can help in code generation.
3. Memory Management
The front end of the compiler and the code generator collaboratively map names in the source program to addresses of data objects in run-time memory.
- During intermediate code generation, a name in a three-address statement refers to a symbol table entry for the name.
- The type in the declaration determines the width, i.e., the amount of storage needed for the declared name.
- From the symbol table entry, a relative address can be determined for the name in a data area for the procedure.
The intermediate representation can be converted to addresses in the target code using the implementations of static and stack allocation of data areas.
4. Instruction Selection
The set of instructions for the target machine determines the difficulty level in the selection of instructions. The uniformity and completeness of the instruction set are important factors.
- If the target machine does not uniformly support each data type, each exception requires special handling.
- Instruction speeds and machine idioms are important factors.
- Efficiency considerations make instruction selection challenging.
For each three-address statement, we can design a code skeleton outlining the target code to be generated. However, statement-by-statement code generation often produces poor code.
5. Register Allocation
Efficient utilization of registers is crucial because instructions using register operands are shorter and faster than those using operands in memory.
Sub-Tasks in Register Utilization
- Register Allocation: Deciding which variables will reside in the registers at a given point in the program.
- Register Assignment: Deciding the specific register where the variable will reside.
An optimal assignment of registers to variables is difficult even with single-register values.
Instruction Abbreviation | Meaning |
---|---|
L | Load |
A | Add |
ST | Store |
M | Multiplication |
D | Division |
SRDA | Shift the dividend into R1 and clear R0 |
6. Choice of Evaluation Order
The order of computations affects the efficiency of the code generator. Deciding the best order of evaluation is a difficult problem and is considered an NP-complete problem.