Skip to main content
Engineering LibreTexts

3.4: Characters and Strings

  • Page ID
    19872
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    In addition to numeric data, symbolic data is often required. Symbolic or non-numeric data might include an important message such as “Hello World”(For more information, refer to: http://en.Wikipedia.org/wiki/”Hello,_World!”_program) a common greeting for first programs. Such symbols are well understood by English language speakers.

    Computer memory is designed to store and retrieve numbers. Consequently, the symbols are represented by assigning numeric values to each symbol or character.

    Character Representation

    In a computer, a character(For more information, refer to: http://en.Wikipedia.org/wiki/Character_(computing)) is a unit of information that corresponds to a symbol such as a letter in the alphabet. Examples of characters include letters, numerical digits, common punctuation marks (such as "." or "!"), and whitespace. The general concept also includes control characters, which do not correspond to symbols in a particular language, but to other information used to process text. Examples of control characters include carriage return or tab.

    American Standard Code for Information Interchange

    Characters are represented using the American Standard Code for Information Interchange (ASCII(For more information, refer to: http://en.Wikipedia.org/wiki/ASCII)). Based on the ASCII table, each character and control character is assigned a numeric value. When using ASCII, the character displayed is based on the assigned numeric value. This only works if everyone agrees on common values, which is the purpose of the ASCII table. For example, the letter “A” is defined as \(65_{10}\) (0x41). The 0x41 is stored in computer memory, and when displayed to the console, the letter “A” is shown. Refer to Appendix A for the complete ASCII table.

    Additionally, numeric symbols can be represented in ASCII. For example, “9” is represented as \(57_{10}\) (0x39) in computer memory. The “9” can be displayed as output to the console. If sent to the console, the integer value \(9_{10}\) (0x09) would be interpreted as an ASCII value which in the case would be a tab.

    It is very important to understand the difference between characters (such as “2”) and integers (such a \(2_{10}\)). Characters can be displayed to the console, but cannot be used for calculations. Integers can be used for calculations, but cannot be displayed to the console (without changing the representation).

    A character is typically stored in a byte (8-bits) of space. This works well since memory is byte addressable.

    Unicode

    It should be noted that Unicode(For more information, refer to: http://en.Wikipedia.org/wiki/Unicode) is a current standard that includes support for different languages. The Unicode Standard provides series of different encoding schemes (UTF- 8, UTF-16, UTF-32, etc.) in order to provide a unique number for every character, no matter what platform, device, application or language. In the most common encoding scheme, UTF-8, the ASCII English text looks exactly the same in UTF-8 as it did in ASCII. Additional bytes are used for other characters as needed. Details regarding Unicode representation are not addressed in this text.

    String Representation

    A string(For more information, refer to: http://en.Wikipedia.org/wiki/String_...puter_science)) is a series of ASCII characters, typically terminated with a NULL. The NULL is a non-printable ASCII control character. Since it is not printable, it can be used to mark the end of a string.

    For example, the string “Hello” would be represented as follows:

    Character "H" "e" "l" "l" "o" NULL
    ASCII Value (decimal) 72 101 108 108 111 0
    ASCII Value (hex) 0x48 0x65 0x6C 0x6C 0x6F 0x0

    A string may consist partially or completely of numeric symbols. For example, the string “19653” would be represented as follows:

    Character "l" "9" "6" "5" "3" NULL
    ASCII Value (decimal) 49 57 54 53 51 0
    ASCII Value (hex) 0x31 0x39 0x36 0x35 0x33 0x0

    Again, it is very important to understand the difference between the string “19653” (using 6 bytes) and the single integer \(19,653_{10}\) (which can be stored in a single word which is 2 bytes).


    This page titled 3.4: Characters and Strings is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Ed Jorgensen.

    • Was this article helpful?