Skip to main content
Engineering LibreTexts

1.5: Compiling- Analysis Phase

  • Page ID
    29009
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    Lexical Analyzer - also called tokenization. 

    The phase of the compiler splits the source code into tokens. A lexical token or simply token is a string with an assigned and thus identified meaning. It is structured as a pair consisting of a token name and an optional token value. Common token names are

    identifier: names the programmer chooses;

    keyword: names already in the programming language;

    separator (also known as punctuators): punctuation characters and paired-delimiters;

    operator: symbols that operate on arguments and produce results;

    literal: numeric, logical, textual, reference literals;

    comment: line, block.

    Examples of token values

    Token name Sample token values
    identifier xcolorUP
    keyword ifwhilereturn
    separator }(;
    operator +<=
    literal true6.02e23"music"
    comment /* Retrieves user data */// must be negative

    Consider this expression in the C programming language:

    x = a + b * 2;

    The lexical analysis of this expression yields the following sequence of tokens:

    [(identifier, x), (operator, =), (identifier, a), (operator, +), (identifier, b), (operator, *), (literal, 2), (separator, ;)]

    Syntax Analyzer

    This is the process of checking a string of symbols, created by the lexical analysis stage,  to see how well the symbols are how they need to be. This is determined by the use of the rules of a formal grammar. For C++, as with any programming language, is a well defined syntax of the language, just as there is a defined syntax for all written languages.

    If the symbols do NOT follow the rules of the grammar the compiler generates a "syntax error" and compilation stops, and attempts to communicate to the user what and where the error is. 

    Semantic Analyzer

    Semantic analysis performs semantic checks such as type checking (makes sure that mathematical operations are being performed on variables declared as int or float), or object binding (making sure that declarations match and function calls and types are correct), or definite assignment (requiring all local variables to be initialized before use), rejecting incorrect programs or issuing warnings. Semantic analysis logically follows the parsing phase, and logically precedes the code generation phase, though it is often possible to fold multiple phases into one pass over the code in a compiler implementation.

    Adapted from: "Introduction of Compiler Design" by Rajesh Kr JhaGeeks for Geeks is licensed under CC BY-SA 4.0
    "Lexical analysis" by Numerous contributors is licensed under CC BY-SA 3.0
    "Parsing" by Numerous contributors is licensed under CC BY-SA 3.0 
    "Compiler" by Numerous contributors is licensed under CC BY-SA 3.0


    This page titled 1.5: Compiling- Analysis Phase is shared under a CC BY-SA license and was authored, remixed, and/or curated by Patrick McClanahan.

    • Was this article helpful?