1.3: The compilation process

Last updated
Save as PDF

Page ID: 40629

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

As a programmer, you should have a mental model of what happens during compilation. If you understand the process, it will help you interpret error messages, debug your code, and avoid common pitfalls.

The steps of compilation are:

Preprocessing: C is one of several languages that include preprocessing directives that take effect before the program is compiled. For example, the #include directive causes the source code from another file to be inserted at the location of the directive.
Parsing: During parsing, the compiler reads the source code and builds an internal representation of the program, called an abstract syntax tree. Errors detected during this step are generally syntax errors.
Static checking: The compiler checks whether variables and values have the right type, whether functions are called with the right number and type of arguments, etc. Errors detected during this step are sometimes called static semantic errors.
Code generation: The compiler reads the internal representation of the program and generates machine code or byte code.
Linking: If the program uses values and functions defined in a library, the compiler has to find the appropriate library and include the required code.
Optimization: At several points in the process, the compiler can transform the program to generate code that runs faster or uses less space. Most optimizations are simple changes that eliminate obvious waste, but some compilers perform sophisticated analyses and transformations.

Normally when you run gcc, it runs all of these steps and generates an executable file. For example, here is a minimal C program:

#include <stdio.h>
int main()
{
    printf("Hello World\n");
}

If you save this code in a file called hello.c, you can compile and run it like this:

$ gcc hello.c
$ ./a.out

By default, gcc stores the executable code in a file called a.out (which originally stood for “assembler output”). The second line runs the executable. The prefix ./ tells the shell to look for it in the current directory.

It is usually a good idea to use the -o flag to provide a better name for the executable:

$ gcc hello.c -o hello
$ ./hello