9.2: The C Programming Language
- Page ID
- 14087
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)C is the oldest programming language that we will encounter in this book. Its basic syntax has been adopted by many other languages, including Java, JavaScript and the OpenGL shader language. C is not object-oriented. It was the basis for the object-oriented language C++, but C is almost as different from C++ as it is from Java. While a large part of C will be familiar to any reader of this book, to really master C, you need to know something about its less familiar parts.
My own experience with C is limited to using it on Linux, where I can use the gcc command to compile C programs. If you want to use gcc on Windows, you might consider MinGW (http://mingw.org/) or Cygwin (https://cygwin.com/). I don’t have experience with either of these — in fact, I have almost no experience with Windows. For Mac OS, you can write C programs using Apple’s XCode development system. It is also possible to install a command line C compiler on Mac OS.
Language Basics
A C program consists of a collection of functions and global variables, which can be spread across multiple files. (All subroutines in C are referred to as “functions,” whether or not they return a value.) Exactly one of those functions must be a main() routine, whose definition generally takes the form
int main(int argc, char **argv) { // main program code }
Execution of the program begins in the main() function. As in Java, the parameters to main() contain information about command line arguments from the command that was used to execute the program. (The “**” has to do with C’s implementation of pointers and arrays, which I will discuss later.) The parameters can be omitted from the definition of main if the program has no need for them. The return value of main() is sent to the operating system to indicate whether or not the program succeeded; a value of 0 indicates success, and any other value indicates that an error occurred.
C makes a distinction between “defining” a variable or function and “declaring” it. A variable or function can have only one definition, but it can be declared any number of times. A variable or function should be declared before it is used, but does not have to be defined before it is used. A C compiler will not look ahead to search for a declaration. (More precisely, if it encounters an undeclared variable, it will assume that it is of type int, and if it encounters an undeclared function, it will try to deduce a declaration. However, this is almost never what you want.)
A function definition takes a form similar to a method definition in Java. The return type for the function must be specified, and a return type of void is used for a function that does not return a value. The type of each parameter must be specified. For example,
int square( int x ) { return x * x; }
Since a definition is also a declaration, this also declares square(). To declare a function without defining it, leave out the body of the function. This is called a “prototype” for the function:
int square(int x);
For variables, a typical variable declaration, such as “int x;
”, is also a definition of the variable. To get a variable declaration that is not a definition, add the word “extern”. For example: “extern int x;
”. You probably won’t need to know this.
One reason for the distinction between declaration and definition is that, although C programs can consist of several files, each file is compiled independently. That is, when C is compiling a file, it looks only at that file. This is true even if several files are compiled with a single command. If file A wants to use a function or variable that is defined in file B, then file A must include a declaration of that function or variable. This type of cross-file reference is usually handled using “header files” and the #include
directive. An include directive in a file tells the compiler to include a copy of the text from the included file in the code that it compiles. A header file typically has a name that ends with ”.h” and contains only declarations. For example, a C source file that wants to use standard input/output will use the following directive at the beginning of the file:
#include <stdio.h>
The stdio.h header file is one of several standard header files that should be installed with any C compiler. Other standard headers include math.h for common mathematical functions, string.h for string manipulation functions, and stdlib.h for some miscellaneous functions including memory management functions.
The compiler will also look in the current directory for header files. In an include directive, the name of such a header file should be enclosed in quotation marks instead of angle brackets. For example,
#include "my-header.h"
If you write a .c file that contains functions meant for use in other files, you will usually write a matching .h file containing declarations of those functions.
After all the files that make up a program have been compiled, they still have to be “linked” together into a complete program. The gcc compiler does the linking automatically by default. Even if all of the files have compiled successfully, there can still be link errors. A link error occurs if no definition is found for a variable or function that has been declared, or if two definitions are found for the same thing. For functions defined in standard libraries, you might need to link the program with the appropriate libraries using the “-l” option on the gcc compiler. For example, a program that uses functions from the math.h header file must be linked with the library named “m”, like this:
gcc my-program.c my-utils.c -lm
It can be difficult to know what libraries need to be linked. Most of my sample C programs, such as glut/first-triangle.c, have a comment that tells how to compile and link the program.
One more note about compiling with gcc. By default, the name of the compiled program will be a.out. The “-o” option on the gcc command is used to specify a different name for the compiled program. For example,
gcc -o my-program my-program.c my-utils.c -lm
Here, the name of the compiled program will be my-program. The name of the compiled program can be used like any other command. In Linux or MacOS, you can run the program on the command line using a command such as
./my-program
The “./” in front of the name is needed to run a command from the current directory. You could also use a full path name to the command.
C has most of the same basic types as Java: char, short, int, long, float, double. There is no boolean type, but but integers can be used as booleans, with 0 representing false and any non-zero value representing true. There is no “byte” data type, but char is essentially an 8-bit integer type that can be used in place of byte. There are no guarantees about the number of bits used for the other numerical data types. The integer types, including char can be marked “signed” or “unsigned”, where the unsigned types have only positive values. For example, signed char has values in the range −128 to 127, while unsigned char has values in the range 0 to 255. Except for char the default for the integer types is signed. (For char, the default is not specified in the standard.) Since C is very profligate about converting one numeric type to another, we don’t have to worry too much about this. (I should note that to avoid the ambiguities of C data types, OpenGL defines its own set of data types such as GLfloat and GLint, and to be completely correct, you can use them in your OpenGL programs in place of C’s usual type names.)
Operators and expressions are similar in C, Java, and JavaScript. As in Java, integer division in C produces an integer result, so that 17/3 is 5. C does not use “+” as a string concatenation operator; in fact, C has no such operator for strings. String concatenation can be done using a function, strcat, from the string.h header file. We will see that some operators can be also used with pointers in C, in ways that have no analog in Java or JavaScript.
The header file stdio.h declares C’s standard input/output functions. I mention it here mostly for the function printf(), which outputs text to the command line and is useful for writing debugging messages. It is essentially the same function as System.out.printf in Java. For example:
printf("The square root of %d is %f\n", x, sqrt(x));
The function sqrt(x), by the way, is defined in the header file, math.h, along with other mathematical functions such as sin(x), cos(x), and abs(x). (In C, abs(x) is always an int. For a floating-point absolute value, use fabs(x).)
Control structures in C are similar to those in Java and JavaScript, with a few exceptions. The switch statement in C works only with integer or character values. There is no try...catch statement. Depending on your C compiler, you might not be able to declare variables in for loops, as in for (int i =....
The original version of C had only one type of comment, starting with /* and ending with */. Modern C also allows single line comments starting with //, so your compiler should accept comments of either form.
Pointers and Arrays
For programmers who have experience with Java or JavaScript, one of the hardest things to get used to in C is its use of explicit pointers. For our purposes, you mostly need to know a little about how the unary operators “*” and “&” are used with pointers. But if you want to use dynamic data structures in C, you need to know quite a bit more.
In C, there is a data type int* that represents “pointer to int.” A value of type int* is a memory address, and the memory location at that address is assumed to hold a value of type int. If ptr is a variable of type int*, then *ptr represents the integer stored at the address to which ptr points. *ptr works like a variable of type int: You can use it in an expression to fetch the value of the integer from memory, and you can assign a value to it to change the value in memory (for example, “*ptr = 17;
”).
Conversely, if num is a variable of type int, then &num represents a pointer that points to num. That is, the value of &num is the address in memory where num is stored. Note that &num is an expression of type int*, and *&num is another name for num. The expression &num can be read as “pointer to num” or “address of num.”
Of course, the operators & and * work with any types, not just with int. There is also a data type named void* that represents untyped pointers. A value of type void* is a pointer that can point anywhere in memory, regardless of what is stored at that location.
Pointer types are often used for function parameters. If a pointer to a memory location is passed to a function as a parameter, then the function can change the value stored in that memory location. For example, consider
void swap ( int *a, int *b ) { int temp = *a; *a = *b; *b = temp; }
The parameters a and b are of type int*, so any actual values passed into the function must be of type pointer-to-int. Suppose that x and y are variables of type int:
int x,y;
Then &x and &y are pointers to int, so they can be passed as parameters to swap:
swap( &x, &y );
Inside the function, a is a pointer to x, which makes *a another name for x. Similarly, *b is another name for y. So, for example, the statement *a = *b; copies the value of y into x. The net result is to swap, or interchange, the values stored in x and in y. In Java or JavaScript, it is impossible to write a similar method that swaps the values of two integer variables.
Note, by the way, that in the declaration int *a, the * is associated with a rather than with int. The intent of the declaration is to say that *a represents an int, which makes a a pointer to int. It is legal, but misleading, to write the declaration as int* a. It is misleading because
int *a, b;
declares a to be a pointer to int and b to be an int. To declare two pointers, you have to say
int *a, *b;
Arrays and pointers are very closely related in C. However, it is possible to use arrays without worrying about pointers. For example, to create an array of 5 ints, you can say
int A[5];
(Note that the “[5]” is associated with the variable name, A, rather than with the type name, “int”.) With this declaration, you can use the array elements A[0] through A[4] as integer variables. Arrays in C are not automatically initialized. The contents of a new array are unknown. You can provide initial values for an array when you declare it. For example, the statement
int B[] = { 2, 3, 5, 7, 9, 11, 13, 17, 19 };
creates an array of length 9 containing the numbers listed between { and }. If you provide initial values for the array, you do not have to specify the array size; it is taken from the list of values. An array does not remember its length, and there is no protection against trying to access array elements that actually lie outside of the array.
The address operator, &, can be applied to array elements. For example, if B is the array from the above declaration, then &B[3] is the address of the location in memory where B[3] is stored. The values of B[3] and B[4] could be swapped by calling
swap( &B[3], &B[4] );
An array variable is considered to be a pointer to the array. That is, the value of an array variable B is the address of the array in memory. This means that B and &B[0] are the same. Furthermore, a pointer variable can be used as if it is an array. For example, if p is of type int*, then p[3] is the third integer in memory after the integer to which p points. And if we define
int *p = &B[3];
then p[0] is the same as B[3], p[1] is the same as B[4], and so on.
An expression of the form p+n, where p is a pointer and n is an integer represents a pointer. Its value is a pointer that points to the n-th item after p in memory. The type of “item” that is referred to here is the type to which p points. For example, if p is a pointer-to-int, then p+3 points to the third integer after the integer to which p refers. And the value of *(p+3) is that integer. Note that the same integer can be referred to as p[3]. In fact, p[n] can be considered to be nothing more than shorthand for *(p+n). (Although it probably takes us farther into C than you want to go, I’ll also mention that the operators ++ and -- can be applied to pointer variables. The effect is to advance the pointer one item forwards or backwards in memory.)
A string in C is essentially an array of char but is usually thought of as being of type char*, that is, pointer to char. By convention, a string always ends with a null character (ASCII code 0) to mark the end of the string. This is necessary because arrays do not have a defined length. The null character is inserted automatically for string literals. You can initialize a variable of type char* with a string literal:
char *greet = "Hello World";
The characters in the string are then given by greet[0], greet[1], \( \dots \), greet[10]. The value of greet[11] is zero, to mark the end of the string.
String manipulation is done using functions that are defined in the standard header file string.h. For example, to test whether two strings are equal, you can use strcmp(s1,s2). And for copying strings, there is a function strcpy(s1,s2). Working with strings in C can be quite tricky, because strings are represented as pointers or arrays, and C does no error checking for null pointers, bad pointers, or array indices out of bounds.
By the way, I can now explain the parameters to the main() routine, int argc
and char **argv
. The parameter argv of type char** is an array of strings (one * to mean array and one * to mean string). This array holds the command that was used to run the program, with argv[0] holding the name of the program and the rest of the array holding any command line arguments. The value of the first parameter, argc, is the length of the array.
Data Structures
C does not have classes or objects. However, it does have a way to represent complex data types: a struct. A struct is similar to a class that contains only variables, with no methods. It is a way of grouping several variables into a unit. For example,
struct color { float r; float g; float b; };
With this definition, struct color becomes a type that can be used to declare variables, parameters, and return types of functions. For example,
struct color bg;
With this declaration, bg is a struct made up of three float variables that can be referred to as bg.r, bg.g, and bg.g. To avoid having the word “struct” as part of the type name, a struct datatype can be declared using typedef:
typedef struct { float r; float g; float b; } color;
This defines color, rather than struct color, to be the name of the type, so that a variable can be declared as
color bg;
It is sometimes useful to work with pointers to structs. For example, we can make a pointer to the struct bg:
color *ptr = &bg;
When this definition, *ptr is another name for bg. The variables in the struct can be referred to as (*ptr).r, (*ptr).g, and (*ptr).b. The parentheses are necessary because the operator “.” has a higher precedence than “*”. But the variables can also be referred to as ptr->r, ptr->g, and ptr->b. When a pointer-to-struct is used to access the variables in a struct, the operator -> is used instead of the period (.) operator.
To implement dynamic data structures in C, you need to be able to allocate memory dynamically. In Java and JavaScript, that can be done using the new operator, but C does not use new. Instead, it has a function, malloc(n), which is declared in the standard header file stdlib.h. The parameter to malloc is an integer that specifies the number of bytes of memory to be allocated. The return value is a pointer of type void* that points to the newly allocated block of memory. (A void* pointer can be assigned to any pointer variable.) Furthermore, since C does not have “garbage collection,” you are responsible for freeing any memory that you allocate using malloc. That can be done using free(ptr), where ptr is a pointer to the block of memory that is being freed. Rather than discuss dynamic data structures in detail, I present a short program to show how they can be used. The program uses a linked list to represent a stack of integers:
#include <stdio.h> // for the printf function #include <stdlib.h> // for the malloc and free functions typedef struct node listnode; // Predeclare the listnode type, so it // can be used for the type of next. struct node { int item; // An item in the list. listnode *next; // Pointer to next item in list. }; listnode *list = 0; // Pointer to head of list, initially null. void push( int item ) { // Add item to head of list listnode *newnode; // Pointer to a new node to hold the item. newnode = malloc( sizeof(listnode) ); // Allocate memory for the node. // (sizeof(listnode) is the number of bytes for a value of type listnode) newnode->item = item; newnode->next = list; list = newnode; // Makes list point to the new node. } int pop() { // Remove and return first item from list int item = list->item; // The item to be returned. listnode *oldnode = list; // Save pointer to node that will be deleted. list = list->next; // Advance list pointer to next item. free(oldnode); // Free the memory used by deleted node. return item; } int main() { int i; for (i = 1; i < 1000000; i *= 2) { // Push powers of two onto the list. push(i); } while (list) { // Pop and print list items (in reverse order). printf("%d\n", pop()); } }
A more complex data structure, such as a scene graph can contain several different kinds of nodes. For such structures, you need even more advanced techniques. One approach is to design a struct that includes the following: data common to all nodes in the data structure; an integer code number to say which of the several possible kinds of node it is; and a void* pointer to link to the extra data needed by nodes of that type. Using a void* pointer means it can point to any kind of data structure, and the code number will tell how to interpret the data that it points to. A better alternative to using a void* pointer is to learn about “union”, something similar to a struct but more useful for representing multiple data types. But perhaps the real solution, if you want to work with complex data structures, is to use C++ instead of C.