# 1.3: Character Representation

• • Charles W. Kann III
• Adjunct Professor (Computer Science) at Gettysburg College

All of the numbers used so far in this text have been binary whole numbers. While everything in a computer is binary, and can be represented as a binary value, binary whole numbers do not represent the universe of numbering systems that exists in computers. Two representations that will be covered in the next two sections are character data and integer data.

Though computers deal use binary to represent data, humans usually deal with information as symbolic alphabetic and numeric data. So to allow computers to handle user readable alpha/numeric data, a system to encode characters as binary numbers was created. That system is called American Standard Code for Information Interchange (ASCII)2. In ASCII all character are represented by a number from 1 - 127, stored in 8 bits. The ASCII encodings are shown in the following table.

Using this table, it is possible to encode a string such as "Once" in ASCII characters as the hexadecimal number 0x4F6E63653 (capital O = 0x4F, n = 0x6E, c = 0x53, e = 0x65).

Numbers as character data are also represented in ASCII. Note the number 13 is 0xD or 11012. However the value of the character string "13" is 0x3133. When dealing with data, it is important to remember what the data represents. Character numbers are represented using binary values, but are very different from their binary numbers.

Finally, some of the interesting patterns in ASCII should be noted. All digits start with binary digits 0011 0000. Thus 0 is 0x0011 0000, 1 is 00011 0000, etc. To convert a numeric character digit to a number it is only necessary to subtract the character value of 0. For example, '0' - '0' = 0, '1' - '0' = 1, etc. This is the basis for an easy algorithm to convert numeric strings to numbers which will be discussed in the problems.

Also note that all ASCII upper case letters start with the binary string 0010 0000, and are 1 offset for each new character. So A is 0100 0001, B is 0100 0010, etc. Lower case letters start with the binary string 0110 and are offset by 1 for each new character, so a is 0110 0001, b is 0110 0010, etc. Therefore, all upper case letters differ from their lower case counterpart by a 1 in the digit 0100. This relationship between lower case and capital letters will be use to illustrate logical operations later in this chapter.

2 ASCII is limited to just 127 characters, and is thus too limited for many applications that deal with internationalization using multiple languages and alphabets. Representations, such as Unicode, have been developed to handle these character sets, but are complex and not needed to understand MIPS Assembly. So this text will limit all character representations to ASCII.