2.5: Detail- ASCII
- Page ID
- 50161
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)ASCII, which stands for “The American Standard Code for Information Interchange,” was introduced by the American National Standards Institute (ANSI) in 1963. It is the most commonly used character code.
ASCII is a seven-bit code, representing the 33 control characters and 95 printing characters (including space) in Table 2.2. The control characters are used to signal special conditions, as described in Table 2.3.
Control Characters | Digits | Uppercase | Lowercase | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
HEX | DEC | CHR | Ctrl | HEX | DEC | CHR | HEX | DEC | CHR | HEX | DEC | CHR | |||
00 | 0 | NUL | ^@ | 20 | 32 | SP | 40 | 64 | @ | 60 | 96 | ‘ | |||
01 | 1 | SOH | ^A | 21 | 33 | ! | 41 | 65 | A | 61 | 97 | a | |||
02 | 2 | STX | ^B | 22 | 34 | " | 42 | 66 | B | 62 | 98 | b | |||
03 | 3 | ETX | ^C | 23 | 35 | # | 43 | 67 | C | 63 | 99 | c | |||
04 | 4 | EOT | ^D | 24 | 36 | $ | 44 | 68 | D | 64 | 100 | d | |||
05 | 5 | ENQ | ^E | 25 | 37 | % | 45 | 69 | E | 65 | 101 | e | |||
06 | 6 | ACK | ^F | 26 | 38 | & | 46 | 70 | F | 66 | 102 | f | |||
07 | 7 | BEL | ^G | 27 | 39 | ’ | 47 | 71 | G | 67 | 103 | g | |||
08 | 8 | BS | ^H | 28 | 40 | ( | 48 | 72 | H | 68 | 104 | h | |||
09 | 9 | HT | ^I | 29 | 41 | ) | 49 | 73 | I | 69 | 105 | i | |||
0A | 10 | LF | ^J | 2A | 42 | * | 4A | 74 | J | 6A | 106 | j | |||
0B | 11 | VT | ^K | 2B | 43 | + | 4B | 75 | K | 6B | 107 | k | |||
0C | 12 | FF | ^L | 2C | 44 | , | 4C | 76 | L | 6C | 108 | l | |||
0D | 13 | CR | ^M | 2D | 45 | - | 4D | 77 | M | 6D | 109 | m | |||
0E | 14 | SO | ^N | 2E | 46 | . | 4E | 78 | N | 6E | 110 | n | |||
0F | 15 | SI | ^O | 2F | 47 | / | 4F | 79 | O | 6F | 111 | o | |||
10 | 16 | DLE | ^P | 30 | 48 | 0 | 50 | 80 | P | 70 | 112 | p | |||
11 | 17 | DC1 | ^Q | 31 | 49 | 1 | 51 | 81 | Q | 71 | 113 | q | |||
12 | 18 | DC2 | ^R | 32 | 50 | 2 | 52 | 82 | R | 72 | 114 | r | |||
13 | 19 | DC3 | ^S | 33 | 51 | 3 | 53 | 83 | S | 73 | 115 | s | |||
14 | 20 | DC4 | ^T | 34 | 52 | 4 | 54 | 84 | T | 74 | 116 | t | |||
15 | 21 | NAK | ^U | 35 | 53 | 5 | 55 | 85 | U | 75 | 117 | u | |||
16 | 22 | SYN | ^V | 36 | 54 | 6 | 56 | 86 | V | 76 | 118 | v | |||
17 | 23 | ETB | ^W | 37 | 55 | 7 | 57 | 87 | W | 77 | 119 | w | |||
18 | 24 | CAN | ^X | 38 | 56 | 8 | 58 | 88 | X | 78 | 120 | x | |||
19 | 25 | EM | ^Y | 39 | 57 | 9 | 59 | 89 | Y | 79 | 121 | y | |||
1A | 26 | SUB | ^Z | 3A | 58 | : | 5A | 90 | Z | 7A | 122 | z | |||
1B | 27 | ESC | ^[ | 3B | 59 | ; | 5B | 91 | [ | 7B | 123 | { | |||
1C | 28 | FS | ^\ | 3C | 60 | ¡ | 5C | 92 | \ | 7C | 124 | – | |||
1D | 29 | GS | ^] | 3D | 61 | = | 5D | 93 | ] | 7D | 125 | } | |||
1E | 30 | RS | ^^ | 3E | 62 | > | 5E | 94 | ^ | 7E | 126 | ~ | |||
1F | 31 | US | ^_ | 3F | 63 | ? | 5F | 95 | _ | 7F | 127 | DEL |
On to 8 Bits
In an 8-bit context, ASCII characters follow a leading 0, and thus may be thought of as the “bottom half” of a larger code. The 128 characters represented by codes between HEX 80 and HEX FF (sometimes incorrectly called “high ASCII” of “extended ASCII”) have been defined differently in different contexts. On many operating systems they included the accented Western European letters and various additional
HEX | DEC | CHR | Ctrl | Meaning |
---|---|---|---|---|
00 | 0 | NUL | ^@ | NULl blank leader on paper tape; generally ignored |
01 | 1 | SOH | ^A | Start Of Heading |
02 | 2 | STX | ^B | Start of TeXt |
03 | 3 | ETX | ^C | End of TeXt; matches STX |
04 | 4 | EOT | ^D | End Of Transmission |
05 | 5 | ENQ | ^E | ENQuiry |
06 | 6 | ACK | ^F | ACKnowledge; affirmative response to ENQ |
07 | 7 | BEL | ^G | BELl; audible signal, a bell on early machines |
08 | 8 | BS | ^H | BackSpace; nondestructive, ignored at left margin |
09 | 9 | HT | ^I | Horizontal Tab |
0A | 10 | LF | ^J | Line Feed; paper up or print head down; new line on Unix |
0B | 11 | VT | ^K | Vertical Tab |
0C | 12 | FF | ^L | Form Feed; start new page |
0D | 13 | CR | ^M | Carriage Return; print head to left margin; new line on Macs |
0E | 14 | SO | ^N | Shift Out; start use of alternate character set |
0F | 15 | SI | ^O | Shift In; resume use of default character set |
10 | 16 | DLE | ^P | Data Link Escape; changes meaning of next character |
11 | 17 | DC1 | ^Q | Device Control 1; if flow control used, XON, OK to send |
12 | 18 | DC2 | ^R | Device Control 2 |
13 | 19 | DC3 | ^S | Device Control 3; if flow control used, XOFF, stop sending |
14 | 20 | DC4 | ^T | Device Control 4 |
15 | 21 | NAK | ^U | Negative AcKnowledge; response to ENQ |
16 | 22 | SYN | ^V | SYNchronous idle |
17 | 23 | ETB | ^W | End of Transmission Block |
18 | 24 | CAN | ^X | CANcel; disregard previous block |
19 | 25 | EM | ^Y | End of Medium |
1A | 26 | SUB | ^Z | SUBstitute |
1B | 27 | ESC | ^[ | ESCape; changes meaning of next character |
1C | 28 | FS | ^\ | File Separator; coarsest scale |
1D | 29 | GS | ^] | Group Separator; coarse scale |
1E | 30 | RS | ^^ | Record Separator; fine scale |
1F | 31 | US | ^_ | Unit Separator; finest scale |
20 | 32 | SP | SPace; usuallly not considered a control character | |
7F | 127 | DEL | DELete; orginally ignored; sometimes destructive backspace |
punctuation marks. On IBM PCs they included line-drawing characters. Macs used (and still use) a different encoding.
Fortunately, people now appreciate the need for interoperability of computer platforms, so more universal standards are coming into favor. The most common code in use for Web pages is ISO-8859-1 (ISO-Latin) which uses the 96 codes between HEX A0 and HEX FF for various accented letters and punctuation of Western European languages, and a few other symbols. The 32 characters between HEX 80 and HEX 9F are reserved as control characters in ISO-8859-1.
Nature abhors a vacuum. Most people don’t want 32 more control characters (indeed, of the 33 control characters in 7-bit ASCII, only about ten are regularly used in text). Consequently there has been no end of ideas for using HEX 80 to HEX 9F. The most widely used convention is Microsoft’s Windows Code Page 1252 (Latin I) which is the same as ISO-8859-1 (ISO-Latin) except that 27 of the 32 control codes are assigned to printed characters, one of which is HEX 80, the Euro currency character. Not all platforms and operating systems recognize CP-1252, so documents, and in particular Web pages, require special attention.
Beyond 8 Bits
To represent Asian languages, many more characters are needed. There is currently active development of appropriate standards, and it is generally felt that the total number of characters that need to be represented is less that 65,536. This is fortunate because that many different characters could be represented in 16 bits, or 2 bytes. In order to stay within this number, the written versions of some of the Chinese dialects must share symbols that look alike.
The strongest candidate for a 2-byte standard character code today is known as Unicode.
Reference
There are many Web pages that give the ASCII chart, with extensions to all the world’s languages. Among the more useful:
- Jim Price, with PC and Windows 8-bit charts, and several further links http://www.jimprice.com/jim-asc.shtml
- A Brief History of Character Codes, with a discussion of extension to Asian languages http://tronweb.super-nova.co.jp/characcodehist.html
- Unicode home page http://www.unicode.org/
- Windows CP-1252 standard, definitive www.microsoft.com/globaldev/r.../sbcs/1252.htm
- CP-1252 compared to:
- Unicode http://ftp.unicode.org/Public/MAPPIN...OWS/CP1252.TXT
- Unicode/HTML http://www.alanwood.net/demos/ansi.html
- ISO-8859-1/Mac OS http://www.jwz.org/doc/charsets.html