Phases of a Compiler
A compiler operates in phases, each transforming the source program from one representation to another. The phases of a compiler are as follows:
1. Lexical Analysis
Lexical analysis, also known as linear analysis or scanning, involves reading the stream of characters from the source program and grouping them into tokens that have collective meaning.
For example, the assignment statement position = initial + rate * 60
will be grouped into the following tokens:
- Identifiers:
{ position, initial, rate }
- Operators:
{ =, +, * }
- Digits:
{ 60 }
2. Syntax Analysis
Syntax analysis, known as hierarchical analysis or parsing, involves grouping the tokens of the source program into grammatical phrases that are used by the compiler to synthesize the output.
3. Semantic Analysis
The semantic analysis phase checks the source program for semantic errors and gathers type information for the subsequent code generation phase. It uses the hierarchical structure determined by the syntax analysis phase to identify the operators and operands of the expressions. An important part of the semantic analyzer is type checking.
4. Symbol Table Management
The Symbol Table is a data structure containing a record for each identifier, with fields for the attributes of the identifier. It allows quick access and retrieval of data related to identifiers.
- Function: Record the identifiers used in the source program and collect information about their attributes.
- Usage: Entered during lexical analysis, updated during semantic analysis and intermediate code generation, and used during code generation.
5. Error Detection and Reporting
Errors can be encountered in any phase. Once detected, a phase must know how to deal with it so that the compilation process can proceed smoothly.
Handling Errors
- Syntax Analysis Phase: Determines errors where the token stream violates the structure rules of the language.
- Semantic Analysis Phase: Detects constructs with the right syntactic structure but no meaningful operation.
- Lexical Analysis Phase: Detects errors where the characters do not form any token of the language.
6. Intermediate Code Generation
After syntax and semantic analysis, compilers generate an intermediate representation of the source program. This intermediate representation should:
- Be easy to produce.
- Be easy to translate into the target program.
7. Code Optimization
The code optimization phase attempts to improve the intermediate code to enhance the running time of the machine code.
8. Code Generation
The code generation phase involves generating the target code, consisting of relocatable machine code or assembly code. Memory locations are selected for variables, and intermediate instructions are translated into a sequence of machine instructions that perform the same task.
Grouping of Phases
The phases deal with the logical organization of a compiler. In implementation, activities from several phases may be grouped into a pass that reads an input file and writes an output file. For example:
- Front-End Pass: Lexical analysis, syntax analysis, semantic analysis, and intermediate code generation grouped into one pass.
- Optional Code Optimization Pass: Code optimization may be an optional pass.
- Back-End Pass: Code generation for a particular target machine.
Compiler Collections
Compiler collections are built around carefully designed intermediate representations, allowing the front end for a particular language to interface with the back end for a certain target machine. This enables:
- Producing compilers for different source languages for one target machine by combining different front ends with the back end for that target machine.
- Producing compilers for different target machines by combining a front end with back ends for different target machines.