Lexical Analyzer - also called tokenization.
The phase of the compiler splits the source code into tokens. A lexical token or simply token is a string with an assigned and thus identified meaning. It is structured as a pair consisting of a token name and an optional token value. Common token names are
identifier: names the programmer chooses;
keyword: names already in the programming language;
separator (also known as punctuators): punctuation characters and paired-delimiters;
operator: symbols that operate on arguments and produce results;
literal: numeric, logical, textual, reference literals;
comment: line, block.
|Token name||Sample token values|
Consider this expression in the C programming language:
x = a + b * 2;
The lexical analysis of this expression yields the following sequence of tokens:
[(identifier, x), (operator, =), (identifier, a), (operator, +), (identifier, b), (operator, *), (literal, 2), (separator, ;)]
This is the process of checking a string of symbols, created by the lexical analysis stage, to see how well the symbols are how they need to be. This is determined by the use of the rules of a formal grammar. For C++, as with any programming language, is a well defined syntax of the language, just as there is a defined syntax for all written languages.
If the symbols do NOT follow the rules of the grammar the compiler generates a "syntax error" and compilation stops, and attempts to communicate to the user what and where the error is.
Semantic analysis performs semantic checks such as type checking (makes sure that mathematical operations are being performed on variables declared as int or float), or object binding (making sure that declarations match and function calls and types are correct), or definite assignment (requiring all local variables to be initialized before use), rejecting incorrect programs or issuing warnings. Semantic analysis logically follows the parsing phase, and logically precedes the code generation phase, though it is often possible to fold multiple phases into one pass over the code in a compiler implementation.
Adapted from: "Introduction of Compiler Design" by Rajesh Kr Jha, Geeks for Geeks is licensed under CC BY-SA 4.0
"Lexical analysis" by numerous contributions is licensed under CC BY-SA 4.0
"Parsing" by numerous contributions is licensed under CC BY-SA 4.0
"Compiler" by numerous contributions is licensed under CC BY-SA 4.0