Compiler Design

What are the Issues in Code Generation

Issues that are usually encountered in code generation are given below.

1. Input to the Code Generator

  • The input to the code generator is the intermediate representation of the source program produced by the front end of the compiler with the symbol table information.
  • There are various choices for intermediate representations.

(i) Linear representation, such as postfix representation or three-address code representation.

(ii) Virtual machine representation, such as stack machine code.

(iii) Graphical representation, such as syntax tree and DAGs. It is assumed that before code generation, the front end of the compiler has taken care of the following issues:

(i) The source program is scanned, parsed, and translated to a detailed intermediate representation.

(i) Necessary type checking is done.

(iii) The input is free of errors.

2. Output of Code Generator

The output of the code generator is the target code. This target code may take on a variety of forms.

(i) Absolute machine language

(ii) Relocatable machine language

(iii) Assembly language

The advantage of producing absolute machine language is that it can be placed in a fixed location in the memory and immediately executed.

The advantage of producing relocatable machine language is that it allows sub-programs to be compiled separately. A set of relocatable object modules can be linked and loaded for execution by a linking loader. If the target machine does not handle relocation automatically, the compiler must provide explicit relocation information to the loader, to link the separately compiled program segments.

The advantage of producing assembly language is that the process of code generation becomes easier. We can generate symbolic instructions and use the macro facility of the assembler to help in code generation.

3. Memory Management

The front end of the compiler and the code generator do the mapping of names in the source program to addresses of the data objects in run-time memory, cooperatively.

In the intermediate code generation, it is assumed that a name in a three-address statement refers to a symbol table entry for the name. These symbol table entries are created as the declaration of the name is examined.

The type in the declaration determines the width i.e. the amount of storage needed for the declared name. From the symbol table entry, a relative address can be determined for the name in a data area for the procedure.

The intermediate representation can be converted to the addresses in the target code by using the implementations of static and stack allocation of data areas.

4. Instruction Selection

The set of instructions for the target machine determines the level of difficulty in the selection of instructions. The uniformity and completeness of the instruction set are important factors in determining the instruction set.

If the target machine does not uniformly support each data type, then each exception to the general rule requires special handling. Instruction speeds and machine idioms are also important factors. If the efficiency of the target program is not considered, then the instruction selection is easy.

For each three-address statement, we can design a code skeleton that outlines the target code to be generated for that construct. A three-address statement like x := y + z, code generated is:

But this kind of statement-by-statement code generation often produces poor code.

5. Register Allocation

  • Efficient utilization of registers is important in code generation because the instructions utilizing register operands are shorter and faster than those using operands in memory.
  • the use of registers is divided into two sun-tasks

(i) Register Allocation: During register allocation, a set of variables that will reside in the registers at a given point in the program is decided

(ii) Register assignment: During register assignment a specific register in which the variable will reside is decided

An optimal assignment of registers to variables is difficult even with single-register values. Consider the given three-address code statements in the optimal machine code sequences with the following representations.

L: Load

A: Add

ST: Store

M: Multiplication

D: Division

SRDA: shift the dividend into R1 and clear Ro so that all bits are equal to the sign bit.

6. Choice of Evaluation Order

The order in which the computations are performed affects the efficiency of the code generator. Deciding the best order of evaluation is a difficult problem. It can be said that deciding the best order of evaluation is an NPcom

Leave a Reply

Your email address will not be published. Required fields are marked *