3.1: Template for an assembly language program

Last updated
Save as PDF

Page ID: 76103

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

3.1.1 Template for an assembly language program

When learning a new language there are some programming details that are necessary to allow the program to run, but that cannot be explained to someone first learning the language. These language concepts can only be explained later after the programmer has learned much more about the language. To allow a novice programmer to start programming in the language, these programming details are often given as a format for a program that must simply be copied to create a program. For example, in the Java programming language all methods must exist in a class, and the program must begin in a static function named main that takes an array of strings as an argument. Why this format must be followed is beyond the ability of beginning Java programmers to efficiently comprehend and irrelevant to the material that is covered at that point in the learning process. Therefore, readers are told that they must copy these conventions until they can be explained later.

When programming in assembly language, the same concept applies. There are standard coding requirements for the program that must be included to make the program run. The first step in this chapter is to write an assembly source program with the structure necessary to make an assembly program work. This first program defining a standard format to allow programmers to create a basic program is to be called template.s, and it can be copied as a starting point, or template, for all subsequent programs. All of code in the template.s program will be explained by the chapter on functions, but for now it should just be copied.

This template source program is shown below. Note that all files containing assembly language source programs should end with the suffix “.s”. In this program, the comment “# enter your program here” is the place you should put the code for your program.

Note, the program lines beginning with a hashtag (#) are comment line. There are two different types of comments in assembly. The # is used if the entire line is a comment. To comment from the current position on the line to the end of the line, for example to add a comment to a specific instruction, this text will use the characters “//”. Note that there are other ways to comment to the end of the line in ARM assembly language, so follow your assemblers preference. For gcc, it is “//”, so that is what will be used here.

Second, all programs should contain a header with information about the program. This is standard practice in most introduction to programming classes in every language, and it seems to be the first good habit that programmers are eager to discard as soon as possible. If you are using this book in a course, it is the author’s hope this is the first thing your instructor takes points off for not doing.

Finally note the indenting in the program. ARM assembly allows, but does not require, indentation. Once again, just because the language does not require it is no reason to stop the practice of indenting.

# 
# Program Name: template.s 
# Author: Charles Kann 
# Date: 9/19/2020 
# Purpose: This program is template that can be used to start ARM assembly 
# program using gcc 
# 

.text 
.global main 

main: 
    # Save return to OS on stack 
    SUB sp, sp, #4 
    STR lr, [sp, #0] 

    # Enter your program here. 
    
    # Return to the OS 
    LDR lr, [sp, #0] 
    ADD sp, sp, #4 
    MOV pc, lr 
.data

1 template.s

3.1.2 Hello World program

The first program that a programmer often writes in a new language is called a “Hello World” program. The purpose of this program is just to:

1. create a program with valid syntax.

be able to process output from the program.
execute the steps to create a valid program executable.
ensure that the executable program file can be run.

To begin, edit the file helloWorldMain.s, and enter the following text. You can start by copying the file template.s to helloWorldMain.s and adding the highlighted code to print the string the “# Printing the Message” block below. Be sure to add the helloWorld variable in the .data section of the program.

# 
# Program name: helloWorldMain.s 
# Author: Charles Kann 
# Date:9/19/2020 
# Purpose: This program shows how to print a string using the C function printf 
# 

.text 
.global main 

main: 
    # Save return to os on stack 
    SUB sp, sp, #4 
    STR lr, [sp, #0] 
    
    # Printing The Message 
    LDR r0, =helloWorld 
    BL printf 
    
    # Return to the OS 
    LDR lr, [sp, #0] 
    ADD sp, sp, #4 
    MOV pc, lr 

.data 
    # Stores the string to be printed 
    helloWorld: .asciz "Hello World\n"

2 helloWorldMain.s

Save the file, return to the shell prompt, and then type the following command:

      gcc HelloWorldMain.s -g -c -o HelloWorldMain.o

This line calls the gcc command. It will see that the input file to the command has a suffix “.s”. This “.s” suffix indicates to the gcc command that the assembler is to be used to process this file.

This gcc command has many options, three of which are used here. The -g option informs the assembler that debugging information should be produced. Programs in this book will generally use the -g option as most programs will have the need to be viewed and debugged. However, the -g option causes the compiler to produce large executable files that tend to execute very slowly, so if a program is to be used in a production environment, the -g option is normally omitted.

The -c option says that the assembler is to only assemble the code to an object file and not attempt to make an executable file from it. It is used to keep an intermediate file created between the assembly source and the final executable file. This intermediate file is called an object file. This object file is often not needed if the goal is only to create an executable file from the assembly file, and in later sections the keeping of the object file will often be omitted.

The final -o option informs the compiler to save its output to an output file with the name specified after the option. For this command the file “helloWorld.o” is created.

The result of this command is an object file called “helloWorld.o”. An object file contains the result of translating the assembly code in the file helloWorld.s into machine code. Machine code is a translation of assembly instructions into a format that only uses binary values that can be understood by the CPU. However, an object file is not an executable file and cannot be used by the CPU. An object file is an intermediate file between a source code assembly file and an executable file. The machine code in an object file stills needs to be combined with other object files to resolve items that are not defined in the source file. In our case, the printf function is not defined in the source code for this program. The printf function is called an external reference that must be resolved before the program can be executed.

The resolution of these external references is achieved by looking in other object files or library files for the definition of the unresolved references. These object and library files must be linked to our object file, and when the external reference is found (or resolved), the function will be combined with the object from the assembler to create an executable program. The program that creates the executable file is called a linking loader (this program can also be called a linkage editor, or a linker, or a loader. We will call the program a linker from here on.) After the linker has found and resolved any references, it writes an executable file that can be run. The linker is run by typing the following command:

      gcc helloWorldMain.o -g -o helloWorld

Because the file name has a “.o” suffix, the gcc command knows to run the linker. The final file, helloWorld, is a file that can be executed. This executable file can be executed by typing the following command:

./helloWorld

The running of the program should produce the string “Hello World”, which tells you that the program is up and running correctly.

3.1.3 Notes on the HelloWorld Program

The following are notes on the helloWorld program to explain the syntax and semantics of the program.

The program is saved in a file helloWorldMain.s. Note that this does not match the match the name of the function, which is main. It does not match what I have called the program name, which is helloWorld. Languages such as Java require the file name match the name of a public class, and there are restrictions or suggestions as to how to do structure programs and name programs and variables. No such restrictions are generally applied to assembly. An untrained programmer can create a nightmare of names and find strange places to store files, that are often impossible to untangle, even by the programmer themselves. So, be careful to follow any standards in place for naming and file management (such as directory structures) that exist when you create your code.
For this book, the programs are stored in directories by the chapter they are found in. This book also has a supplemental style guide which is suggested be used in writing your programs unless your employer or professor chooses different standards.
Except for the highlighted lines, all of the code in the HelloWorld program was copied from the Template.s file. For now, this code is just copied when starting to write all programs and not explained further.
Any line that starts with a # (hashtag) is a comment line. If the # is not the first character on a line, it is not a comment but signifies that the following is a numeric token9. Note that numeric tokens can be:
3.1 decimal value token specified by #nnn, where n is any decimal digit.
3.2 hex value token specified by #0xnn, where n is any hexadecimal digit.
3.3 binary value token, specified by #0bnnnnnnnn, where n is any binary digit10.

4. To add a comment after an instruction, use the string “//” 11. For example, the following is an instruction line containing a comment.

bl printf // branch to the printf function

The ldr (Load Register) operation loads a register with a value from memory. In this
case the “=” sign means load the address of the string “HelloWorld” into r0.
The printf command looks for the address of the string format to output in r0. The register r0 must contain the address of the format string when calling the function printf.
The \n in the format string is an escape sequence meaning print a new line. A complete list of all escape sequences can be found at https://en.cppreference.com/w/c/language/escape.
The bl (Branch and Link), used in the “bl printf” instruction saves the return pointer (where the function is to return), and then branches (or jumps) to the function that is named (printf). When the printf function completes, it returns to the statement immediately after the function call, as it does in any other language.

3.1.4 Using make to Create the Program

The final part of this section is to explain how to create the program more easily. This will be done using a makefile and the make command. A makefile is a way of automating the steps that need to be done to create some artifact, such as a program. Any task that requires multiple steps to be executed can be accomplished using the make command, but its normal use is to create program executable files.

  Target          Dependency

hellowWorld:    hellowWorldMain.s
        gcc helloWorldMain.s -g -c -o helloWorld.o
        gcc helloWorld.o -g -o helloWorld

The makefile presented here consists of 3 parts:

A target, which is the file that is to be created as a result of running this make command.
Dependencies, which are the files or resources that are to be checked for changes that would require the target to be remade. For example, if the source code file is changed, the time stamp on the source code file is more recent than the time stamp on the target. This means that changes have been made to the source file, and these should be reflected in the target. The make command will thus be rerun to incorporate recent changes.
Rules are the recipe (or the commands to be run and the order to execute them) to recreate the target file. Note that lines containing rules must be indented with tabs. The space in front of the gcc commands cannot be blanks, but must be a tab.

In this example the target, or program to be created, is named helloWorld and is located in the current directory. It has one dependency, the file helloWorldMain.s. If this dependency file is changed, the program HelloWorld is remade by using the gcc commands.

To run this makefile, type the command make at the shell command prompt in the directory where these files reside. The make command will check the modification times of the files helloWorld and helloWorldMain.s, and run the commands if needed. If everything is currently up-to-date, it will print out “nothing to make”.

3.1 Prompting for an Input String

The next program will show how to read an input string into the program and then print it out as part of a formatted string. The C function scanf will be used to read the string. To start, edit a file named printNameMain.s and enter the text in the following program.

# 
# Program Name: printNameMain 
# Author: Charles Kann 
# Date: 9/19/2020 
# Purpose: To read a string using scanf 
# Input: 
# - input: Username 
# Output: 
# - format: Prints the greeting string 

.text 
.global main 

main: 
    # Save return to os on stack 
    SUB sp, sp, #4
    STR lr, [sp, #0] 
    
    # Prompt for an input 
    LDR r0, =prompt 
    BL printf 
    
    # Scanf 
    LDR r0, =input 
    LDR r1, =name 
    BL scanf 
    
    # Printing the message 
    ldr r0, =format 
    ldr r1, =name 
    BL printf 
    
    # Return to the OS 
    LDR lr, [sp, #0] 
    ADD sp, sp, #4 
    MOV pc, lr 
    
.data 
    # Prompt the user to enter their name 
    prompt: .asciz "Enter your name: " 
    # Format for input (read a string) 
    input: .asciz "%s" 
    # Format of the program output
    format: .asciz "Hello %s, how are you today? \n" 
    # Reserves space in the memory for name 
    name: .space 40

3 printNameMain.s

To create the program, there is no need to type the commands at the shell prompt since a makefile already exists in this directory. Edit the makefile and change it to the following:

all: helloWorld printName
helloWorld: helloWorldMain.s
      gcc $@Main.s -g -o $@      
      ./helloWorld
printName: printNameMain.s      
      gcc $@Main.s -g -o $@
      ./printName

4 makefile for printName

This Makefile has several changes. In this file the first target is “all”. If the make program is run without specifying a target on the command line, the make will use the first target in the makefile, which is often named all. This all target consists of two other dependencies, helloWorld and printName, and so these files become targets. Thus, running the makefile will run the check to see if either the helloWorld or printName programs need to be remade. If both are current, it will return the message “nothing to make”. If either or both of the programs are not current, it will remake the programs so that all the executable files are current.

Note that any of the targets can be explicitly called. For example, make all will use the target all, make helloWorld will only check and remake only the helloWorld program, and make printName will check and remake only the printName program.

Second, the rules in this makefile specify the program executable is made directly from the source code file without creating an object file. The gcc command is still making the object file, but it does not keep it after the gcc command is run. The object file is not something needed to run the program, and so it is not necessary for the program to keep it. Thus, this gcc command allows the executable file to be created without having to keep the object file.

Finally, there is a rule for each of the programs that runs the programs. These changes to the makefile will remake and test the programs.