Skip to main content
Engineering LibreTexts

7.8: How to Calculate Branch Amounts in Machine Code

  • Page ID
    28724
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    This chapter has shown how to use the branch statements to implement structure programming logic. However how a branch statement manipulates the $pc register to control the execution has yet to be discussed. This section will cover the details of how the branch statement is implemented in machine code.

    7.8.1 Instruction Addresses

    When the memory for the MIPS computer was shown in section 3.2, a segment labeled program text (or simply text) was shown as starting at address 0x00400000. This section of the memory contains the machine code translation of the instructions from the .text segment of your program. Thus the text segment of memory is where all the machine code instructions for the program are stored.

    When a program is assembled, the first instruction is inserted at address 0x0040000. The instructions for the program each take 4 bytes, so the assembler keeps an internal counter, and for each instruction it adds 4 to that counter and uses that number for the address of the next instruction. The new instruction is then placed at that address in the memory, and the process is continued to allow each subsequent assembly instruction inserted at the next available word boundary. Thus the first instruction in a program is at address 0x00400000, the second instruction is at address 0x00400004, etc. Note that machine instructions must always start on a word boundary.

    A simple example of an assembled program is the following.

    addi $t0, $zero, 10
    addi $t1, $zero, 15
    add $t0, $t0, $t1
    

    This program is shown below in a MARS screen image. The Address column of the grid shows the address of the instruction. In this example, the first instruction is stored at 0x00400000, the second at 0x00400004, and the third at 0x00400008.

    Figure 7-1: Instruction addresses for a simple program

    Screen Shot 2020-07-01 at 6.47.22 PM.png

    So if all of the instructions are real instructions, placing the instructions at the correct address is as simple as adding 4 to each previous instruction. The problem is pseudo operators, as one pseudo instruction can map to more than one real instruction. For example, a la pseudo instruction always 2 takes instructions, and thus takes up 8 bytes. Some instructions, such as immediate instructions which can have either 16 bit or 32 bit arguments, can be different lengths depending on the arguments. Thus it is important to be able to translate pseudo operators into real instructions. This is shown in the following example program. Note how the Source column is translated in real instructions in the Basic column. The real instructions are the ones that are numbered in the Source column.

    Figure 7-2: Instruction addresses for program with pseudo operators

    Screen Shot 2020-07-01 at 6.48.40 PM.png

    It is important to be able to number the instructions correctly to calculate branch offsets.

    7.8.2 Value in the $pc register

    Branches in MIPS assembly work by adding or subtracting the value of the immediate part of the instruction from the $pc register. So if a branch is taken, $pc is updated by the immediate value, and control of the program continues at the labeled instruction..

    Earlier in Chapter 3 when it was explained how the $pc register is used to control the flow of the program, it was apparent that at the start of each instruction the $pc points to the instruction to execute. So a reader could think that the value to be incremented by the immediate part of the branch is the address of the current instruction. However when an instruction executes, the first thing that is done is the $pc is incremented by 4 to point to the next instruction. This makes sense since in the majority of instances the program is processed sequentially. However this means that when a branch is executed the amount which must be added or subtracted will be from the next sequential instruction, not the current instruction.

    The following example shows how this works. In the first branch instruction, the branch is to label2. The distance between this instruction and the label consists of 3 real instructions, which is 3 words or 12 bytes, from the current instruction. However since the $pc was already incremented to point to the next instruction, the branch will be incremented by 8 bytes, not 12.

    Screen Shot 2020-07-01 at 7.00.25 PM.png

    The second branch instruction branches backward to label1. In this case, the distance between the instruction and the label is -2 instructions, which is 2 words or 8 bytes, back from the current instruction. However because the $pc is incremented to point to the next instruction, -3 words, or -12 bytes, must be subtracted from the $pc in the branch instruction.

    Screen Shot 2020-07-01 at 9.17.36 PM.png

    The following MARS screen shot shows that this is indeed the branch offsets for each of the branch instructions.

    Figure 7-3: Branch offset program example

    Screen Shot 2020-07-01 at 9.24.39 PM.png

    7.8.3 How the word boundary effects branching

    Remember that the I format instruction uses a 16 bit immediate value. If this was the end of the story, then branches could be up to 64K bytes from the current $pc. In terms of instructions, this means that a branch can access instructions that are -8191..8192 real instructions from the current instruction. This may be sufficient for most cases, but there is a way to allow the size of the branch offset to be increased to 218 bits. Remember that all instructions must fall on a word boundary, so the address will always be divisible by 4. This means that the lowest 2 bits in every address must always be “00”. Since we know the lowest two bits must always be “00”, there is no reason to keep them, and they are dropped. Thus the branch forward in the previous instruction is 2 (10002 >> 2 = 00102 , or more simply 8/4 = 2). The branch backward is likewise - 3 (1101002 >> 2 = 111012 , or more simply -12/4 = -3).

    Be careful to remember that the branch offsets are calculated in bytes, and that the two lowest order 00 bits have been truncated and must be reinserted when the branch address is calculated. The reason this caution is given is that the size of the offset in the branch instruction is the number of real instructions the current $pc needs to be incremented/decremented. This is just a happy coincidence. It makes calculating the offsets easier, as all that needs to be done is count the number of real instructions between the $pc and the label, but that in no way reflects the true meaning of the offset.

    7.8.4 Translating branch instructions to machine code

    Now that the method of calculating the branch offsets for the branch instructions has been explained, the following program shows an example of calculating the branch offsets in a program. Note that in this example the trick of dropping the last two bits of the address will be used, so the branch offsets can be used simply by adding/subtracting line numbers. Therefore the text will read “the $pc points to line”, which is correct, as opposed to “the $pc contains the address of line”, which would be incorrect.

    1. Start with the program as written by the programmer. Note that there are 3 branch statements. Only these 3 branch statements will be translated to machine code. In this case the entire program, including comments, is included so that the reader understands the program. However comments are not kept when a translation to machine code is made, so the subsequent presentations of these programs will drop the comments.
      # Filename:         PrintEven.asm
      # Author:           Charles Kann
      # Date:             12/29/2013
      # Purpose:          Print even numbers from 1 to 10
      # Modification Log: 12/29/2013 - Initial release
      #
      # Pseudo Code
      # global main()
      # {
      #     // The following variable can be kept in a save register.
      #     register int i
      #
      #     // Counter loop from 1 to 10
      #     for(i=1;i<11;i++)
      #     {
      #         if ((i %2) == 0)
      #         {
      #             print("Even number: " + i)
      #         }
      #     }
      # }
      
      .text
      .globl main
      main:
          # Register Conventions:
          # $s0-i
          addi $s0, $zero, 1
          
              BeginForLoop:
              addi $t0, $zero, 11
              slt $t0, $s0, $t0
              beqz $t0, EndForLoop
                  addi $t0, $zero, 2
                  div $s0,$t0
                  mfhi $t0
                  seq $t0, $t0, 0
                  beqz $t0, Odd
              
                  la $a0, result
                  move $a1, $s0
                  jal PrintInt
                  jal NewLine
          
              Odd:
              addi $s0, $s0, 1
              b BeginForLoop
          EndForLoop:
          
          jal Exit
      .data
          result: .asciiz "Even number: "
      .include "utils.asm"
      
    2. The next step is to translate all pseudo instructions in the program into real instructions, and then number each instruction.

      Line #

      Label

      Statement

      1

      addi $16, $0, 0x00000001
      

      2

      BeginForLoop

      addi $8, $0, 0x0000000b
      

      3

      slt $8, $16, $8
      

      4

      beq $8, $0, ????? (label EndForLoop)
      

      5

      addi $8, $0, 0x00000002
      

      6

      div $16,$8
      

      7

      mfhi $8
      

      ---

      #seq $t0, 4t0, 0 is 4 real instructions
      

      8

      addi $1, $0, 0x00000000
      

      9

      subu $8, $8, $1

      10

      ori $1, $0, 0x00000001

      11

      sltu 48, $8, $1

      12

      beq $8, $0, ???? (label Odd)

      ----

      # la $a0, result is 2 real instructions

      13

      lui $1, 0x00001001

      14

      ori $r, $1, 0x00000000

      15

      addu $5, 40, $16

      16

      jal ----- (doesn’t matter at this point)

      17

      jal ----- (doesn’t matter at this point)

      18

      Odd

      addi $16, $16, 0x00000001

      19

      beq $0, $0, ???? (label BeginForLoop)

      20

      EndForLoop

      jal ---- (doesn’t matter at this point)
    3. Calculate the offsets. The first branch instruction, "beq $t0, EndForLoop" is at line 4, so the $pc when it is executing would point to line 5. The label is at line 20, so the branch offset would be 15. The beq instruction is an I type instruction with an op-code of 0x4, so the machine code translation of this instruction 0x1100000f.

      The next branch instruction,”beq $8, $0, Odd” is at line 12, and the label Odd is at line 18. This means we can subtract 18-13 (as the $pc has been updated), and the branch offset is 5. The translation to machine code of this instruction is 0x11000005.

      The final branch instruction, “beq $0. $0, BeginForLoop” is at line 19, and the label BeginForLoop is at line 2. This means that we can subtract 2-20, which gives a branch offset of -18. Note that this branch is negative, so -18 must be a negative 2’s complement, or 0xffffffee. The translation to machine code of this instruction is 0x0100ffee.

    7.8.5 PC relative addressing

    The type of addressing done with branch statements is called PC relative addressing. The reason for this name is that all branch addresses are calculated as an offset from the PC. This is contrasted with Jump (J) instructions, which branch to absolute addresses. So while a branch address must be calculated, a jump address is whatever is in the jump instruction. Both implement branches to different parts of the program, so why are there the two different formats?

    The first reason is that a J instruction can access the entire .text segment of memory. To access the entire .text segment requires 26 bits to store the address. This leaves no room for registers which need to be compared, as in the I instruction. The branch instructions can do operations like compare registers, but is limited in that the address it contains only has 16 bits. This means that the branch instruction is limited in that it can only access addresses relatively local to the current $pc. So the basic difference is that the jump instruction can access any point in the text memory, but cannot be conditional. The branch instruction can be conditional, but cannot access all of the text memory.

    PC relative addressing has another advantage. The compiler can generate the code for the branch at compile time, as it does not need to know the absolute addresses of the statements, only how far they are from the current $pc. This means that the code can easily be moved (or relocated) in the .text area and still work correctly. In the example of generating machine code for the branch instruction above, note that it really doesn’t matter if the code fragment for printing odd/evennumbers is at address 0x10010000, 0x10054560, or any other address. The branch is always relative to the $pc, so where the code exists is irrelevant to its correct execution.

    However because the J instructions all branch to a fixed address, that address must be defined before the program begins to execute, and the absolute address cannot be changed (the code cannot be relocated).

    So the difference between branches and jumps comes down to how they are used. Normally when compiling a program, any program control transfer inside of a file (if statements, loops, etc) is implemented using branch statements. Any program control transfer to a point outside of a file, which means a call to a subprogram, is normally implemented with a jump23.


    This page titled 7.8: How to Calculate Branch Amounts in Machine Code is shared under a CC BY 4.0 license and was authored, remixed, and/or curated by Charles W. Kann III.

    • Was this article helpful?