Skip to main content
Library homepage
 
Engineering LibreTexts

0.37: Starting a process

  • Page ID
    81561
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    Starting a process

    We mentioned before that simply saying the program starts with the main() function is not quite true. Below we examine what happens to a typical dynamically linked program when it is loaded and run (statically linked programs are similar but different XXX should we go into this?).

    Firstly, in response to an exec system call the kernel allocates the structures for a new process and reads the ELF file specified from disk.

    We mentioned that ELF has a program interpreter field, PT_INTERP, which can be set to 'interpret' the program. For dynamically linked applications that interpreter is the dynamic linker, namely ld.so, which allows some of the linking process to be done on the fly before the program starts.

    In this case, the kernel also reads in the dynamic linker code, and starts the program from the entry point address as specified by it. We examine the role of the dynamic linker in depth in the next chapter, but suffice to say it does some setup like loading any libraries required by the application (as specified in the dynamic section of the binary) and then starts execution of the program binary at its entry point address (i.e. the _init function).

    Kernel communication to programs

    The kernel needs to communicate some things to programs when they start up; namely the arguments to the program, the current environment variables and a special structure called the Auxiliary Vector or auxv (you can request the the dynamic linker show you some debugging output of the auxv by specifying the environment value LD_SHOW_AUXV=1).

    The arguments and environment at fairly straight forward, and the various incarnations of the exec system call allow you to specify these for the program.

    The kernel communicates this by putting all the required information on the stack for the newly created program to pick up. Thus when the program starts it can use its stack pointer to find the all the startup information required.

    The auxiliary vector is a special structure that is for passing information directly from the kernel to the newly running program. It contains system specific information that may be required, such as the default size of a virtual memory page on the system or hardware capabilities; that is specific features that the kernel has identified the underlying hardware has that userspace programs can take advantage of.

    Kernel Library

    We mentioned previously that system calls are slow, and modern systems have mechanisms to avoid the overheads of calling a trap to the processor.

    In Linux, this is implemented by a neat trick between the dynamic loader and the kernel, all communicated with the AUXV structure. The kernel actually adds a small shared library into the address space of every newly created process which contains a function that makes system calls for you. The beauty of this system is that if the underlying hardware supports a fast system call mechanism the kernel (being the creator of the library) can use it, otherwise it can use the old scheme of generating a trap. This library is named linux-gate.so.1, so called because it is a gateway to the inner workings of the kernel.

    When the kernel starts the dynamic linker it adds an entry to the auxv called AT_SYSINFO_EHDR, which is the address in memory that the special kernel library lives in. When the dynamic linker starts it can look for the AT_SYSINFO_EHDR pointer, and if found load that library for the program. The program has no idea this library exists; this is a private arrangement between the dynamic linker and the kernel.

    We mentioned that programmers make system calls indirectly through calling functions in the system libraries, namely libc. libc can check to see if the special kernel binary is loaded, and if so use the functions within that to make system calls. As we mentioned, if the kernel determines the hardware is capable, this will use the fast system call method.

    Starting the program

    Once the kernel has loaded the interpreter it passes it to the entry point as given in the interpreter file (note will not examine how the dynamic linker starts at this stage; see Chapter 9, Dynamic Linking for a full discussion of dynamic linking). The dynamic linker will jump to the entry point address as given in the ELF binary.

    Example 8.19. Disassembley of program startup
      1 $ cat test.c
        
        int main(void)
        {
      5 	return 0;
        }
        
        $ gcc -o test test.c
        
     10 $ readelf --headers ./test | grep Entry
          Entry point address:               0x80482b0
        
        $ objdump --disassemble ./test
        
     15 [...]
        
        080482b0 <_start>:
         80482b0:       31 ed                   xor    %ebp,%ebp
         80482b2:       5e                      pop    %esi
     20  80482b3:       89 e1                   mov    %esp,%ecx
         80482b5:       83 e4 f0                and    $0xfffffff0,%esp
         80482b8:       50                      push   %eax
         80482b9:       54                      push   %esp
         80482ba:       52                      push   %edx
     25  80482bb:       68 00 84 04 08          push   $0x8048400
         80482c0:       68 90 83 04 08          push   $0x8048390
         80482c5:       51                      push   %ecx
         80482c6:       56                      push   %esi
         80482c7:       68 68 83 04 08          push   $0x8048368
     30  80482cc:       e8 b3 ff ff ff          call   8048284 <__libc_start_main@plt>
         80482d1:       f4                      hlt
         80482d2:       90                      nop
         80482d3:       90                      nop
        
     35 08048368 <main>:
         8048368:       55                      push   %ebp
         8048369:       89 e5                   mov    %esp,%ebp
         804836b:       83 ec 08                sub    $0x8,%esp
         804836e:       83 e4 f0                and    $0xfffffff0,%esp
     40  8048371:       b8 00 00 00 00          mov    $0x0,%eax
         8048376:       83 c0 0f                add    $0xf,%eax
         8048379:       83 c0 0f                add    $0xf,%eax
         804837c:       c1 e8 04                shr    $0x4,%eax
         804837f:       c1 e0 04                shl    $0x4,%eax
     45  8048382:       29 c4                   sub    %eax,%esp
         8048384:       b8 00 00 00 00          mov    $0x0,%eax
         8048389:       c9                      leave
         804838a:       c3                      ret
         804838b:       90                      nop
     50  804838c:       90                      nop
         804838d:       90                      nop
         804838e:       90                      nop
         804838f:       90                      nop
        
     55 08048390 <__libc_csu_init>:
         8048390:       55                      push   %ebp
         8048391:       89 e5                   mov    %esp,%ebp
         [...]
        
     60 08048400 <__libc_csu_fini>:
         8048400:       55                      push   %ebp
         [...]

    Above we investigate the very simplest program. Using readelf we can see that the entry point is the _start function in the binary. At this point we can see in the disassembley some values are pushed onto the stack. The first value, 0x8048400 is the __libc_csu_fini function; 0x8048390 is the __libc_csu_init and then finally 0x8048368, the main() function. After this the value __libc_start_main function is called.

    __libc_start_main is defined in the glibc sources sysdeps/generic/libc-start.c. The file function is quite complicated and hidden between a large number of defines, as it needs to be portable across the very wide number of systems and architectures that glibc can run on. It does a number of specific things related to setting up the C library which the average programmer does not need to worry about. The next point where the library calls back into the program is to handle init code.

    init and fini are two special concepts that call parts of code in shared libraries that may need to be called before the library starts or if the library is unloaded respectively. You can see how this might be useful for library programmers to setup variables when the library is started, or to clean up at the end. Originally the functions _init and _fini were looked for in the library; however this became somewhat limiting as everything was required to be in these functions. Below we will examine just how the init/fini process works.

    At this stage we can see that the __libc_start_main function will receive quite a few input paramaters on the stack. Firstly it will have access to the program arguments, environment variables and auxiliary vector from the kernel. Then the initalization function will have pushed onto the stack addresses for functions to handle init, fini, and finally the address of the main function itself.

    We need some way to indicate in the source code that a function should be called by init or fini. With gcc we use attributes to label two functions as constructors and destructors in our main program. These terms are more commonly used with object oriented languages to describe object life cycles.

    Example 8.20. Constructors and Destructors
      1 $ cat test.c
        #include <stdio.h>
        
        void __attribute__((constructor)) program_init(void)  {
      5   printf("init\n");
        }
        
        void  __attribute__((destructor)) program_fini(void) {
          printf("fini\n");
     10 }
        
        int main(void)
        {
          return 0;
     15 }
        
        $ gcc -Wall  -o test test.c
        
        $ ./test
     20 init
        fini
        
        $ objdump --disassemble ./test | grep program_init
        08048398 <program_init>:
     25 
        $ objdump --disassemble ./test | grep program_fini
        080483b0 <program_fini>:
        
        $ objdump --disassemble ./test 
     30 
        [...]
        08048280 <_init>:
         8048280:       55                      push   %ebp
         8048281:       89 e5                   mov    %esp,%ebp
     35  8048283:       83 ec 08                sub    $0x8,%esp
         8048286:       e8 79 00 00 00          call   8048304 <call_gmon_start>
         804828b:       e8 e0 00 00 00          call   8048370 <frame_dummy>
         8048290:       e8 2b 02 00 00          call   80484c0 <__do_global_ctors_aux>
         8048295:       c9                      leave
     40  8048296:       c3                      ret
        [...]
        
        080484c0 <__do_global_ctors_aux>:
         80484c0:       55                      push   %ebp
     45  80484c1:       89 e5                   mov    %esp,%ebp
         80484c3:       53                      push   %ebx
         80484c4:       52                      push   %edx
         80484c5:       a1 2c 95 04 08          mov    0x804952c,%eax
         80484ca:       83 f8 ff                cmp    $0xffffffff,%eax
     50  80484cd:       74 1e                   je     80484ed <__do_global_ctors_aux+0x2d>
         80484cf:       bb 2c 95 04 08          mov    $0x804952c,%ebx
         80484d4:       8d b6 00 00 00 00       lea    0x0(%esi),%esi
         80484da:       8d bf 00 00 00 00       lea    0x0(%edi),%edi
         80484e0:       ff d0                   call   *%eax
     55  80484e2:       8b 43 fc                mov    0xfffffffc(%ebx),%eax
         80484e5:       83 eb 04                sub    $0x4,%ebx
         80484e8:       83 f8 ff                cmp    $0xffffffff,%eax
         80484eb:       75 f3                   jne    80484e0 <__do_global_ctors_aux+0x20>
         80484ed:       58                      pop    %eax
     60  80484ee:       5b                      pop    %ebx
         80484ef:       5d                      pop    %ebp
         80484f0:       c3                      ret
         80484f1:       90                      nop
         80484f2:       90                      nop
     65  80484f3:       90                      nop
        
        
        $ readelf --sections ./test
        There are 34 section headers, starting at offset 0xfb0:
     70 
        Section Headers:
          [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
          [ 0]                   NULL            00000000 000000 000000 00      0   0  0
          [ 1] .interp           PROGBITS        08048114 000114 000013 00   A  0   0  1
     75   [ 2] .note.ABI-tag     NOTE            08048128 000128 000020 00   A  0   0  4
          [ 3] .hash             HASH            08048148 000148 00002c 04   A  4   0  4
          [ 4] .dynsym           DYNSYM          08048174 000174 000060 10   A  5   1  4
          [ 5] .dynstr           STRTAB          080481d4 0001d4 00005e 00   A  0   0  1
          [ 6] .gnu.version      VERSYM          08048232 000232 00000c 02   A  4   0  2
     80   [ 7] .gnu.version_r    VERNEED         08048240 000240 000020 00   A  5   1  4
          [ 8] .rel.dyn          REL             08048260 000260 000008 08   A  4   0  4
          [ 9] .rel.plt          REL             08048268 000268 000018 08   A  4  11  4
          [10] .init             PROGBITS        08048280 000280 000017 00  AX  0   0  4
          [11] .plt              PROGBITS        08048298 000298 000040 04  AX  0   0  4
     85   [12] .text             PROGBITS        080482e0 0002e0 000214 00  AX  0   0 16
          [13] .fini             PROGBITS        080484f4 0004f4 00001a 00  AX  0   0  4
          [14] .rodata           PROGBITS        08048510 000510 000012 00   A  0   0  4
          [15] .eh_frame         PROGBITS        08048524 000524 000004 00   A  0   0  4
          [16] .ctors            PROGBITS        08049528 000528 00000c 00  WA  0   0  4
     90   [17] .dtors            PROGBITS        08049534 000534 00000c 00  WA  0   0  4
          [18] .jcr              PROGBITS        08049540 000540 000004 00  WA  0   0  4
          [19] .dynamic          DYNAMIC         08049544 000544 0000c8 08  WA  5   0  4
          [20] .got              PROGBITS        0804960c 00060c 000004 04  WA  0   0  4
          [21] .got.plt          PROGBITS        08049610 000610 000018 04  WA  0   0  4
     95   [22] .data             PROGBITS        08049628 000628 00000c 00  WA  0   0  4
          [23] .bss              NOBITS          08049634 000634 000004 00  WA  0   0  4
          [24] .comment          PROGBITS        00000000 000634 00018f 00      0   0  1
          [25] .debug_aranges    PROGBITS        00000000 0007c8 000078 00      0   0  8
          [26] .debug_pubnames   PROGBITS        00000000 000840 000025 00      0   0  1
    100   [27] .debug_info       PROGBITS        00000000 000865 0002e1 00      0   0  1
          [28] .debug_abbrev     PROGBITS        00000000 000b46 000076 00      0   0  1
          [29] .debug_line       PROGBITS        00000000 000bbc 0001da 00      0   0  1
          [30] .debug_str        PROGBITS        00000000 000d96 0000f3 01  MS  0   0  1
          [31] .shstrtab         STRTAB          00000000 000e89 000127 00      0   0  1
    105   [32] .symtab           SYMTAB          00000000 001500 000490 10     33  53  4
          [33] .strtab           STRTAB          00000000 001990 000218 00      0   0  1
        Key to Flags:
          W (write), A (alloc), X (execute), M (merge), S (strings)
          I (info), L (link order), G (group), x (unknown)
    110   O (extra OS processing required) o (OS specific), p (processor specific)
        
        $ objdump --disassemble-all --section .ctors ./test
        
        ./test:     file format elf32-i386
    115 
        Contents of section .ctors:
         8049528 ffffffff 98830408 00000000           ............
        

    The last value pushed onto the stack for the __libc_start_main was the initialisation function __libc_csu_init. If we follow the call chain through from __libc_csu_init we can see it does some setup and then calls the _init function in the executable. The _init function eventually calls a function called __do_global_ctors_aux. Looking at the disassembley of this function we can see that it appears to start at address 0x804952c and loop along, reading an value and calling it. We can see that this starting address is in the .ctors section of the file; if we have a look inside this we see that it contains the first value -1, a function address (in big endian format) and the value zero.

    The address in big endian format is 0x08048398, or the address of program_init function! So the format of the .ctors section is firstly a -1, and then the address of functions to be called on initialisation, and finally a zero to indicate the list is complete. Each entry will be called (in this case we only have the one function).

    Once __libc_start_main has completed with the _init call it finally calls the main() function! Remember that it had the stack setup initially with the arguments and environment pointers from the kernel; this is how main gets its argc, argv[], envp[] arguments. The process now runs and the setup phase is complete.

    A similar process is enacted with the .dtors for destructors when the program exits. __libc_start_main calls these when the main() function completes.

    As you can see, a lot is done before the program gets to start, and even a little after you think it is finished!


    0.37: Starting a process is shared under a CC BY-SA 3.0 license and was authored, remixed, and/or curated by LibreTexts.

    • Was this article helpful?