5.9: Cryptography as a Building Block (Advanced Topic)

Last updated
Save as PDF

Page ID: 58531

Jerome H. Saltzer & M. Frans Kaashoek
Massachusetts Institute of Technology via MIT OpenCourseWare

$ \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } $ $ \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} $$\newcommand{\id}{\mathrm{id}}$ $ \newcommand{\Span}{\mathrm{span}}$ $ \newcommand{\kernel}{\mathrm{null}\,}$ $ \newcommand{\range}{\mathrm{range}\,}$ $ \newcommand{\RealPart}{\mathrm{Re}}$ $ \newcommand{\ImaginaryPart}{\mathrm{Im}}$ $ \newcommand{\Argument}{\mathrm{Arg}}$ $ \newcommand{\norm}[1]{\| #1 \|}$ $ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$ $ \newcommand{\Span}{\mathrm{span}}$ $\newcommand{\id}{\mathrm{id}}$ $ \newcommand{\Span}{\mathrm{span}}$ $ \newcommand{\kernel}{\mathrm{null}\,}$ $ \newcommand{\range}{\mathrm{range}\,}$ $ \newcommand{\RealPart}{\mathrm{Re}}$ $ \newcommand{\ImaginaryPart}{\mathrm{Im}}$ $ \newcommand{\Argument}{\mathrm{Arg}}$ $ \newcommand{\norm}[1]{\| #1 \|}$ $ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$ $ \newcommand{\Span}{\mathrm{span}}$$\newcommand{\AA}{\unicode[.8,0]{x212B}}$

This section sketches how primitives such as ENCRYPT, DECRYPT, pseudorandom number generators, SIGN, VERIFY, and cryptographic hashes can be implemented using cryptographic transformations (also called ciphers). Readers who wish to understand the implementations in detail should consult books such as Applied Cryptography by Bruce Schneier [Suggestions for Further Reading 5.5.1], or Handbook of Applied Cryptography by Menezes, van Oorschot, and Vanstore [Suggestions for Further Reading 5.5.2]. Introduction to cryptography by Buchmann provides a concise description of the number theory that underlies cryptography [Suggestions for Further Reading 5.5.9]. There are many subtle issues in designing secure implementations of the primitives, which are beyond the scope of this text.

Unbreakable Cipher for Confidentiality (One-Time Pad)

Making an unbreakable cipher for only confidentiality is easy, but there's a catch. The recipe is as follows. First, find a process that can generate a truly random unlimited string of bits, which we call the key string, and transmit this key string through secure (i.e., providing confidentiality and authentication) channels to both the sender and receiver before they transmit any data through an insecure network.

Once the key string is securely in the hands of the sender, the sender converts the plaintext into a bit string and computes bit-for-bit the exclusive OR (XOR) of the plaintext and the key string. The sender can send the resulting ciphertext over an insecure network to a receiver. Using the previously communicated key string, the receiver can recover the plaintext by computing the XOR of the ciphertext and key string.

To be more precise, this transforming scheme is a stream cipher. In a stream cipher, the conversion from plaintext to ciphertext is performed one bit or one byte at a time, and the input can be of any length. In our example, a sequence of message (plaintext) bits $m_1, m_2, \dots , m_n$ is transformed using an equal-length sequence of secret key bits $k_1, k_2, \dots , k_n$ that is known to both the sender and the receiver. The $i$-th bit $c_i$ of the ciphertext is defined to be the XOR (modulo-2 sum) of $m_i$ and $k_i$ , for $i = 1, \dots , n$:

\[ c_i = m_i \oplus k_i \]

Untransforming is just as simple, because:

\[ m_i = c_i \oplus k_i = m_i \oplus k_i \oplus k_i = m_i \]

This scheme, under the name "one-time pad," was patented by Vernam in 1919 (U.S. patent number 1,310,719). In his version of the scheme, the "pad" (that is, the one-time key) was stored on paper tape.

The key string is generated by a random number generator, which produces as output a "random" bit string. That is, from the bits generated so far, it is impossible to predict the next bit. True random-number generators are difficult to construct; in fact, true sources of random sequences come only from physical processes, not from deterministic computer programs.

If we flip a single message bit, the corresponding ciphertext bit flips. Similarly, if a single ciphertext bit is flipped by a network error (or an adversary), the receiver will untransform the ciphertext to obtain a message with a single bit error in the corresponding position. Thus, the one-time pad (both transforming and untransforming) has limited change propagation: changing a single bit in the input causes only a single bit in the output to change.

Unless additional measures are taken, an adversary can add, flip, or replace bits in the stream without the recipient realizing it. The adversary may have no way to know exactly how these changes will be interpreted at the receiving end, but the adversary can probably create quite a bit of confusion. This cipher provides another example of the fact that message confidentiality and integrity are separate goals.

The catch with a one-time pad is the key string. We must have a secure channel for sending the key string and the key string must be at least as long as the message. One approach to sending the key string is for the sender to generate a large key string in advance. For example, the sender can generate 10 CDs full of random bits and truck them over to the receiver by armored car. Although this scheme may have high bandwidth (6.4 Gigabytes per truckload), it probably has latency too large to be satisfactory.

The key string must be at least as long as the message. It is not hard to see that if the sender re-uses the one-time pad, an adversary can determine quickly a bit (if not everything) about the plaintext by examining the XOR of the corresponding ciphertext (if the bits are aligned properly, the pads cancel). The National Security Agency (NSA) once caught the Russians in such a mistake$^*$ in Project VENONA$^{**}$.

$^*$ R. L. Benson, The Venona Story, National Security Agency, Center for logic History, 2001. http://www.nsa.gov/publications/publi00039.cfm

$^{**}$ D. P. Moynihan (chair), Secrecy: Report of the commision on protecting and reducing government secrecy, Senate document 105-2, 103rd congress, United States government printing office,1997.

Pseudorandom Number Generators

One shortcut to avoid having to send a long key string over a secure channel is to use a pseudorandom number generator. A pseudorandom number generator produces deterministically a random-appearing bit stream from a short bit string, called the seed. Starting from the same seed, the pseudorandom generator will always produce the same bit stream. Thus, if both the sender and the receiver have the secret short key, using the key as a seed for the pseudorandom generator they can generate the same, long key string from the short key and use the long key string for the transformation.

Unlike the one-time pad, this scheme can in principle be broken by someone who knows enough about the pseudorandom generator. The design requirement on a pseudorandom number generator is that it is difficult for an opponent to predict the next bit in the sequence, even with full knowledge of the generating algorithm and the sequence so far. More precisely:

Given the seed and algorithm, it is easy to compute the next bit of the output of the pseudorandom generator.
Given the algorithm and some output, it is difficult (or impossible) to predict the next bit.
Given the algorithm and some output, it is difficult (or impossible) to compute what the seed is.

Analogous to ciphers, the design is usually open: the algorithm for the pseudorandom generator is open. Only the seed is secret, and it must be produced from a truly random source.

RC4: A Pseudorandom Generator and its Use

RC4 was designed by Ron Rivest for RSA Data Security, Inc. RC4 stands for Ron's Code number 4. RSA tried to keep this cipher secret, but someone published a description anonymously on the Internet. (This incident illustrates how difficult it is to keep something secret, even for a security company!) Because RSA never confirmed whether the description is indeed RC4, people usually refer to the published version as ARC4, or alleged RC4.

The core of the RC4 cipher is a pseudorandom generator, which is surprisingly simple. It maintains a fixed array S of 256 entries, which contains a permutation of the numbers 0 through 255 (each array entry is 8 bits). It has two counters i and j , which are used as follows to generate a pseudorandom byte k :

1 procedure RC4_GENERATE () 2 i ← (i + 1) modulo 256 3 j ← (j + S[i]) modulo 256 4 SWAP (S[i], S[j]) 5 t ← (S[i] + S[j]) modulo 256 6 k ← S[t] 7 return k

The initialization procedure takes as input a seed, typically a truly-random number, which is used as follows:

1 procedure RC4_INIT (seed) 2 for i from 0 to 255 do 3 S[i] ← i 4 K[i] ← seed[i] 5 j ← 0 6 for i from 0 to 255 do 7 j ← (j + S[i] + K[i]) modulo 256 8 SWAP(S[i], S[j]) 9 i ← j ← 0

The procedure RC4_INIT fills each entry of S with its index: S[0] ← 0, S[1] ←1, etc. (see lines 2 through 4). It also allocates another 256-entry array ( K ) with each 8-bit entries. It fills K with the seed, repeating the seed as necessary to fill the array. Thus, K[0] contains the first 8 bits of the key string, K[1] the second 8 bits, etc. Then, it runs a loop (lines 6 through 8) that puts S in a pseudorandom state based on K (and thus the seed).

Confidentiality using RC4

Given the RC4 pseudorandom generator, ENCRYPT and DECRYPT can be implemented as in the one-time pad, except instead of using a truly-random key string, we use the output of the pseudorandom generator. To initialize, the sender and receiver invoke on their respective computers RC4_INIT, supplying the shared-secret key for the stream as the seed. Because the sender and receiver supply the same key to the initialization procedure, RC4_GENERATE on the sender and receiver computer will produce identical streams of key bytes, which ENCRYPT and DECRYPT use as a one-time pad.

In more detail, to send a byte $b$, the sender invokes RC4_GENERATE to generate a pseudorandom byte $k$ and encrypts byte $b$ by computing $c = b \oplus k$. When the receiver receives byte $c$, it invokes RC4_GENERATE on its computer to generate a pseudorandom byte $k_1$ and decrypts the byte $c$ by computing $b \oplus k_1$. Because the sender and receiver initialized the generator with the same seed, $k$ and $k_1$ are identical, and $c \oplus k_1$ gives $b$.

RC4 is simple enough that it can be coded from memory, yet it appears it is computationally secure and a moderately strong stream cipher for confidentiality, though it has been noticed that the first few bytes of its output leak information about the shared-secret key, so it is important to discard them. Like any stream cipher, it cannot be used for authentication without additional mechanism. When using it to encrypt a long stream, it doesn’t seem to have any small cycles and its output values vary highly (RC4 can be in about 256! $\times$ 256² possible states). The key space contains 2²⁵⁶ values, so it is also difficult to attack RC4 by brute force. RC4 must be used with care to achieve a system’s overall security goal. For example, the Wired Equivalent Privacy scheme for WiFi wireless networks (see page 11–50) uses the RC4 output stream without discarding the beginning of the stream. As a result, using the leaked key information mentioned above it is relatively easy to crack WEP wireless encryption$^*$.

The story of flawed confidentiality in WiFi's use of RC4 illustrates that it is difficult to create a really good pseudorandom number generator. Here is another example of that difficulty: during World War II, the Lorenz SZ 40 and SZ 42 cipher machines, used by the German Army, were similarly based on a (mechanical) pseudorandom number generator, but a British code-breaking team was able, by analyzing intercepted messages, to reconstruct the internal structure of the generator, build a special-purpose computer to search for the seed, and thereby decipher many of the intercepted messages of the German Army$^{**}$.

$^*$ A. Stubblefield, J. Ioannidis, and A. Rubin, Using the Fluhrer, Mantin, and Shamir attack to break WEP, Symposium on Network and Distributed System Security, 2002.

$^{**}$ F. H. Hinsley and Alan Stripp, Code Breakers: The Inside Story of Bletchley Park (Oxford University Press, 1993), page 161.

Block Ciphers

Depending on the constraints on their inputs, ciphers are either stream ciphers or block ciphers. In a block cipher, the cipher performs the transformation from plaintext to ciphertext on fixed-size blocks. If the input is shorter than a block, ENCRYPT must pad the input to make it a full block in length. If the input is longer than a block, ENCRYPT breaks the input into several blocks, padding the last block is padded, if necessary, and then transforms the individual blocks. Because a given plaintext block always produces the same output with a block cipher, ENCRYPT must use a block cipher with care. We outline one widely used block cipher and how it can be used to implement ENCRYPT and DECRYPT.

Advanced Encryption Standard (AES)

Advanced Encryption Standard (AES)$^1$ has 128-bit (or longer) keys and 128-bit plaintext and ciphertext blocks. AES replaces Data Encryption Standard (DES)$^2 \ ^3$, which is now regarded as too insecure for many applications, as distributed Internet computations or dedicated special-purpose machines can use a brute-force exhaustive search to quickly find a 56-bit DES key given corresponding plaintext and ciphertext [Suggestions for Further Reading 5.5.4].

AES takes a 128-bit input and produces a 128-bit output. If you don't know the 128-bit key, it is hard to reconstruct the input given the output. The algorithm works on a 4 $\times$ 4 array of bytes, called state . At the beginning of the cipher the input array in is copied to the state array as follows:

$A 4-by-4 array named "input" contains i_0 at the top left corner and i_15 at the bottom right corner, with i_1 through i_3 running down the array's leftmost column. The input array is transformed into a 4-by-4 array named "state", where every entry is labeled s_{r,c} where r stands for the entry's row in the array (with the topmost row having r=0) and c represents the entry's column (with the leftmost column having c=0). The state array is transformed into a 4-by-4 array named "output", where every entry is labeled on with n corresponding to the subscript of the entry in the same position in the "input" array.$

Figure $\PageIndex{1}$: AES transformation of the input array to the state array to the output array.

At the end of the cipher the state array is copied into the output array out as depicted. The four bytes in a column form 32-bit words.

The cipher transforms state as follows:

1 procedure AES (in, out, key) 2     state ← in                                          // copy in into state as described above 3     ADDROUNDKEY (state, key)                            // mix key into state

4     for r from 1 to 9 do 5         SUBBYTES (state)                                // substitute some bytes in state 6         SHIFTROWS (state)                               // shift rows of state cyclically 7         MIXCOLUMNS (state)                              // mix the columns up 8         ADDROUNDKEY (state, key[r×4, (r+1)×4 – 1])      // expand key, mix in

9     SUBBYTES (state) 10    SHIFTROWS (state) 11    ADDROUNDKEY (state, key[10×4, 11×4 – 1]) 12    out ← state                                         // copy state into out as described above

The cipher performs 10 rounds (denoted by the variable r ), but the last round doesn't invoke MIXCOLUMNS. Each ADDROUNDKEY takes the 4 words from key and adds them into the columns of state as follows:

[s_0,c, s_1,c, s_2,c, s_3,c, s_4,c] ← [s_0,c, s_1,c, s_2,c, s_3,c, s_4,c] ⊕ key_r×4+c, for 0 ≤ c < 4.

That is, each word of key is added to the corresponding column in state .

For the first invocation (on line 3) of ADDROUNDKEY, r is 0, and in that round ADDROUNDKEY uses the 128-bit key completely. For subsequent rounds, AES generates additional key words using a carefully-designed algorithm. The details and justification are outside of the scope of this textbook, but the flavor of the algorithm is as follows. It takes earlier-generated words of the key and produces a new word, by substituting well-chosen bits, rotating words, and computing the XOR of certain words.

The procedure SUBBYTES applies a substitution to the bytes of state according to a well-chosen substitution table. In essence, this mixes the bytes of state up.

The procedure SHIFTROWS shifts the last three rows of state cyclically as follows:

s_r,c ← s_{r,(c + shift(r, 4)) modulo 4}, for 0 ≤ c < 4

The value of SHIFT is dependent on the row number as follows:

SHIFT(1,4) = 1, SHIFT(2,4) = 2, and SHIFT(3,4) = 3

The procedure MIXCOLUMNS operates column by column, applying a well-chosen matrix multiplication.

In essence, AES is a complicated transformation of state based on key . Why this transformation is thought to be computationally secure is beyond the scope of this text. We just note that it has been studied by many cryptographers and it is believed to secure.

$^1$ Advanced Encryption Standard, Federal Information Processing Standards Publications (FIPS PUBS) 197, National Institute of Standards and Technology (NIST), Nov. 2001.

$^2$ Data Encryption Standard. U.S. Department of Standards, National Bureau of Standards, Federal Information Processing Standard (FIPS) Publication #46, January, 1977 (#46–1 updated 1988; #46–2 updated 1994).

$^3$ Horst Feistel, William A. Notz, and J. Lynn Smith. Some cryptographic techniques for machine-to-machine data communications. Proceedings of the IEEE 63, 11 (November, 1975), pages 1545–1554. An older paper by the designers of the DES providing background on why it works the way it does. One should be aware that the design principles described in this paper are incomplete; the really significant design principles are classified as military secrets.

Cipher-Block Chaining

With block ciphers, the same input with the same key generates the same output. Thus, one must be careful in using a block cipher for encryption. For example, if the adversary knows that the plaintext is formatted for a printer and each line starts with 16 blanks, then the line breaks will be apparent in the ciphertext because there will always be an 8-byte block of blanks, enciphered the same way. Knowing the number of lines in the text and the length of each line may be usable for frequency analysis to search for the shared-secret key.

A good approach to constructing ENCRYPT using a block cipher is cipher-block chaining. Cipher-block chaining (CBC) randomizes each plaintext block by XOR-ing it with the previous ciphertext block before transforming it (see Figure $\PageIndex{2}$). A random dummy ciphertext block, called the initialization vector or IV, is inserted at the beginning.

Figure $\PageIndex{2}$: Cipher-block chaining.

More precisely, if the message has blocks M₁, M₂, …, M_n,ENCRYPT produces the ciphertext consisting of blocks C₀, C₁, …, C_n as follows:

C₀ = IV and C_i ← BC(M_i ⊕ C_i-1, key) for i = 1, 2,…, n

where BC is some block cipher (e.g., AES). To implement DECRYPT, one computes:

M_i ← C_i_-1 ⊕ BC(C_i , key)

CBC has cascading change propagation for the plaintext: changing a single message bit (say in M_i) causes a change in C_i, which causes a change in C_i₊₁ , and so on. CBC's cascading change property, together with the use of a random IV as the first ciphertext block, implies that two encryptions of the same message with the same key will result in entirely different-looking ciphertexts. The last ciphertext block Cn is a complicated keydependent function of the IV and of all the message blocks. We will use this property later.

On the other hand, CBC has limited change propagation for the ciphertext: changing a bit in ciphertext block C_icauses the receiver to compute M_iand M_i₊₁ incorrectly, but all later message blocks are still computed correctly. Careful study of Figure $\PageIndex{2}$ should convince you that this property holds.

Ciphers with limited change propagation have important applications, particularly in situations where ciphertext bits may sometimes be changed by random network errors and where, in addition, the receiving application can tolerate a moderate amount of consequently modified plaintext.

Computing a Message Authentication Code

So far we used ciphers for only confidentiality, but we can use ciphers also to compute authentication tags so that the receiver can detect if an adversary has changed any of the bits in the ciphertext. That is, we can use ciphers to implement the SIGN and VERIFY interface, discussed in Section 5.3. Using shared-secret cryptography, there are two different approaches to implementing the interface: 1) using a block or stream cipher or 2) using a cryptographic hash function. We discuss both.

MACs Using Block Cipher or Stream Cipher

CBC-MAC is a simple message authentication code scheme based on a block cipher in CBC mode. To produce an authentication tag for a message M with a key k , SIGN pads the message out to an integral number of blocks with zero bits, if necessary, and transforms the message M with cipher-block chaining, using the key k as the initialization vector (IV). (The key k is an authentication key, different from the encryption key that the sender and receiver may also use.) All ciphertext blocks except the last are discarded, and the last ciphertext block is returned as the value of the authentication tag (the MAC). As noted earlier, because of cascading change propagation, the last ciphertext block is a complicated function of the secret key and the entire message.

VERIFY recomputes the MAC from M and key k using the same procedure that SIGN used, and compares the result with the received authentication tag. An adversary cannot produce a message M that the receiver will believe is authentic because the adversary doesn’t know key k.

One can also build SIGN and VERIFY using stream ciphers by, for example, using the cipher in a mode called cipher-feedback (CFB). CFB works like CBC in the sense that it links the plaintext bytes together so that the ciphertext depends on all the preceding plaintext. For the details consult the literature.

MACs Using a Cryptographic Hash Function

The basic idea for computing a MAC with a cryptographic hash function is as follows. If the sender and receiver share an authentication key k , then the sender constructs a MAC for a message M by computing the cryptographic hash of the concatenated message k + M : HASH (k + M). Since the receiver knows k , the receiver can recompute HASH (k + M) and compare the result with the received MAC. Because an adversary doesn't know k , the adversary cannot forge the MAC for the message M .

This basic idea must be refined to make the MAC secure because without modifications it has problems. For example, Lucifer can add bytes to the end of the message without the receiver noticing. This attack can perhaps be countered with adding the length of the message to the beginning of the message. Cryptographers have given this problem a lot of attention and have come up with a construction, called HMAC [Suggestions for Further Reading 5.5.7], which is said to be as secure as the underlying cryptographic hash function. HMAC uses two strings:

innerpad , which is the byte 36_hex repeated 64 times
outerpad , which is the byte 5C_hex repeated 64 times

Using these strings, HMAC computes the MAC for a message M and an authentication key k as follows:

HASH ((k ⊕ outerpad) + HASH ((k ⊕ innerpad) + M))

To compute the XOR , HMAC pads k with enough zero bytes to make it of length 64. If k is longer than 64 bytes, HMAC uses HASH (k), padded with enough zero bytes to make the result of length 64 bytes.

HMAC can be used with any good cryptographic hash function. Sidebar $\PageIndex{1}$ describes SHA-1, a widely used cryptographic hash function. Even though SHA-1 must have collisions, no one has uncovered an example of one so far. Recent findings (February 2005) suggest weaknesses in SHA-1 and National Institute for Standards and Technology is recommending switching to longer versions named SHA-256 and SHA512. Some cryptographers are recommending that research on designing cryptographic hash functions should start over.

Sidebar $\PageIndex{1}$

Secure Hash Algorithm (SHA)

SHA$^*$ is a family of cryptographic hash algorithms. SHA-1 takes as input a message of any length smaller than 264 bits and produces a 160-bit hash. It is cryptographic in the sense that given a hash value, it is computationally infeasible to recover the corresponding message or to find two different messages that produce the same hash.

SHA-1 computes the hash as follows. First, the message being hashed is padded to make it a multiple of 512 bits long. To pad, one appends a 1, then as many 0’s as necessary to make it 64 bits short of a multiple of 512 bits, and then a 64-bit big-endian representation of the length (in bits) of the unpadded message. The padded string of bits is turned into a 160-bit value as follows.

The message is split into 512-bit blocks. Each block is expanded from 512 bits (16 32-bit words M ) to 80 32-bit words as follows (W(t) is the t-th word):

M_t, for t = 0 to 15 W(t) = (W(t–3) ⊕ (W(t–8) ⊕ (W(t–14) ⊕ (W(t–16) <<< 1 for t = 16 to 79

where <<< is a left circular shift.

SHA uses four nonlinear functions and four 32-bit constants. The four functions are

(X & Y) | ((~X) & Z), for t = 0 to 19 F(t, x, y, z) = (X ⊕ Y ⊕ Z), for t = 20 to 39 (X & Y) | (X & Z) | (Y & Z), for t = 40 to 59 X ⊕ Y ⊕ Z, for t = 60 to 79

The constants are

5A827999_hex, for t = 0 to 19 // 2.5/4 in hex K(t) = 6ED9EBA1_hex, for t = 20 to 39 // 3.5/4 in hex 8F1BBCDC_hex, for t = 40 to 59 // 5.5/5 in hex CA62C1D6_hex, for t = 60 to 79 // 10.5/4 in hex

SHA uses five 32-bit variables (160 bits) to compute the hash. They are initialized and copied into 5 temporary variables:

a ← A ← 67452301_hex b ← B ← EFCDAB89_hex c ← C ← 98BADCFE_hex d ← D ← 10325476_hex e ← E ← C3D2E1F0_hex

The 160-bit hash value for a message is now computed as follows:

1 for each 512-bit block of M do 2 for t from 0 to 79 do 3 x ← (a <<< 5) + F(t, b, c, d) + e + W(t) + K(t) 4 e ← d 5 d ← c 6 c ← b <<< 30 7 b ← a 8 a ← x 9 A ← A + a; B ← B + b; C ← C + c; D ← D + d; E ← E + e 10 hash = A + B + C + D + E // concatenate A, B, C, D, and E

Other hashes in the SHA family are similar in spirit, but have different constants, word sizes, and produce hash values with more bits. For example, SHA-256 has a different W and F, and produces a 256-bit value.The justification for the SHA family of hashes is outside the scope of this text.

$^*$ Secure hash standard, Federal Information Processing Standards Publications (FIPS PUBS) 180-1, National Institute of Standards and Technology (NIST), April 1995.

A Public-Key Cipher

The ciphers described so far are shared-secret ciphers. Both the sender and receiver must know the shared secret key. Public-key ciphers remove this requirement, which opens up new kinds of applications, as the main body of the chapter described. The literature contains several public-key ciphers. We explain the first invented one because it is easy to explain, yet is still believed to be secure.

Rivest-Shamir-Adleman (RSA) Cipher

The security of the RSA cipher relies on a simple-to-state (but hard to solve) well-known problem in number theory [Suggestions for Further Reading 5.5.3]. RSA was developed at M.I.T. in 1977 (patent number 4,405,829), and is named after its inventors: Rivest, Shamir, and Adleman (RSA). It is based on properties of prime numbers; in particular, it is computationally expensive to factor large numbers (for ages mathematicians have been trying to come up with efficient algorithms with little success), but much cheaper to find large primes.

The basic idea behind RSA is as follows. Initially you choose two large prime numbers ($p$ and $q$, each larger than 10¹⁰⁰). Then compute $n = p \times q$ and $z = (p-1) \times (q-1)$, and find a large number $d$ that is relatively prime to $z$. Finally, find an $e$ such that $e \times d = 1 (\text{modulo} \,\, z)$. After finding these numbers once, you have two keys, $(e, n)$ and $(d, n)$, which are hard to derive from each other, even though $n$ is public.

For now assume that the message to be transformed using RSA has a value $P$ that is greater than or equal to zero and smaller than $n$. (below, Sections 5.9.5.2 and 5.9.5.3 discuss how to use RSA for signatures and encryption of any message in more detail.) The cipher $C$ is computed by raising $P$ to the power $e$: $P^e (\text{modulo} \,\, n)$. To decipher, we compute $C$ to the power $d$: $C^d (\text{modulo} \, n)$.

The reason this works is as follows. $C^d = P^{ed} = P^{k(p-1)(q-1) + 1}$, since $e \times d = 1 (\text{modulo} \,\, z)$. That reduction uses the theorem that $P^{(p-1)(q-1)} = P^0 ( \text{modulo} \, n)$, a result by Euler, and now $P^{k(p-1)(q-1)+1} = P \times P^{k(p-1)(q-1)} = P \times P^0 = P \times 1 = P$. The theorem that the exponent $k(p-1)(q-1) = 0 (\text{modulo} \,\, n)$ is a result by Euler and Fermat (see I. Niven and H.S. Zuckerman, An Introduction to the Theory of Numbers, Wiley, New York, 1980).

An example with concrete numbers may illuminate the abstract mathematics. If one chooses $p = 47$, $q = 59$, and $e=17$, then $d = 157$ because $e \times d = 1 (\text{modulo} \,\, 2668)$. This gives us two keys: $(17, 2773)$ and $(157, 2773)$. Now we can transform any $P$ with a value between 0 and 2773. For example, if $P$ is 31, $C = 587 = 31^{17} (\text{modulo} \,\, 2773)$. To reverse the transform, we compute $587^{157} = 31 (\text{modulo} \,\, 2773)$.

One way to break this scheme is to factor the modulus ($n$). In 1977 Ron Rivest (the R in RSA) estimated that factoring a 125-digit decimal number would take 40 quadrillion years, using the best known algorithms and state-of-the-art hardware running at 1 million instructions per second$^*$. To test this claim and to encourage research into computational number theory and factoring, RSA Security, the company commercializing RSA, has posted several products of two primes, also called RSA numbers, as factoring challenges. Understanding the speed at which factoring can be done helps in choosing a suitable key length for a desired level of security.

In 1994, a group of researchers under the guidance of A.J. Lenstra factored a 129 digit decimal RSA number in 8 months using the Internet as a parallel computer, without paying for the cycles$^{**}$. It required 5,000 MIPS years (i.e., 5,000 one-million-instructions-per-second computers each running for one year). Rivest's calculation is an example of the hazards involved in estimating an historic work factor. Better algorithms have been developed, allowing the computation to be performed in only 5,000 MIPS years instead of 40 quadrillion MIPS years, and communication technology has improved substantially, allowing 5,000 or more computers to be harnessed to perform that much computation in only one year.

In November 2005, the RSA challenge number of 193 decimal digits was factored in 3 months using even better algorithms and faster computers (80 2.2 Gigahertz Opteron processors). A 193-decimal digit number is 640 binary bits. Currently it is considered secure to use 1024-bit RSA numbers as keys. The RSA challenge numbers of 704, 768, 896, 1024, 1536, and 2048 bits are still open.

The security of RSA is based on its historical work factor. At this point, there are no known algorithms for factoring large numbers quickly. Although several other public-key ciphers exist, some of which are not covered by patents, to date no public-key system has been found for which one can prove a sufficiently large lower bound on the work factor. The best statement one can make now is the work factor based on the best known algorithms. It might be possible that some day a technique is discovered that may lead to fast factoring (e.g., using quantum computation), and thereby undermine the security of RSA.

RSA needs prime numbers; fortunately, there are many of them and generating them is much easier than factoring a product of two primes: "is $n$ prime?" is a much easier question than "what are the factors of $n$?" There are approximately $n/ \ln (n)$ prime numbers less than or equal to $n$. Thus, for numbers that can be expressed with 1024 bits or fewer, there are approximately 2¹⁰²¹ prime numbers. Therefore, we won't run out of prime numbers, if everyone needs two prime numbers different from everyone else's primes. In addition, an adversary won't have a lot of success creating a database that contains all prime numbers because there are so many.

$^*$ Martin Gardner, Mathematical games: A new kind of cipher that would take million of years to break, Scientific American 237, pages 120–124, August 1977.

$^{**}$ K. Leutwyler, Superhack: forty quadrillion years early, 129-digit code is broken, Scientific American, 271, 17–20, 1994.

Computing a Digital Signature

An important use of public-key ciphers is to implement the SIGN and VERIFY interface. If this interface is implemented using public-key cryptography, the authentication tag is called a digital signature. The basic idea—which needs refinement to be secure—for computing an RSA digital signature is as follows. SIGN produces an authentication tag by raising M to the private exponent. VERIFY raises the authentication tag to the public exponent, compares the result to the received message, and returns ACCEPT if they match and REJECT if they don't.

The implementation doesn’t always guarantee authenticity, however. For example, if Lucifer succeeds in having Alice sign messages $M_1$ and $M_2$, then he can claim that Alice also signed $M_3$, where $M_3$ is the product of $M_1$ and $M_2$: $(M_3)^d = (M_1 × M_2)^d = M_1^d × M_2^d (\text{modulo} \,\, n)$. Thus, if Lucifer sends $M_3$ to Bob, when Bob uses Alice's public key to verify message $M_3$, that message will appear to have been signed by Alice.

To avoid this problem (and some others) SIGN usually computes a cryptographic hash of the message, and creates an authentication tag by raising this hash to the private exponent. This also has the pleasant side effect that it simplifies signing large messages because $n$ only has to be larger than the value of the hash output, and we don’t have to worry about splitting the message into blocks and signing each block. Upon receipt, VERIFY recomputes the hash from the received version of the message, raises the hash to the public exponent, and compares the result with the received authentication tag.

Using a cryptographic hash helps in constructing a secure SIGN and VERIFY, but isn't sufficient either. There is a substantial literature that presents even better schemes that also address other subtle issues that come up in the design of a good digital signature scheme.

A Public-Key Encrypting System

ENCRYPT and DECRYPT can also be implemented using public-key cryptography, but because operations in public-key systems are expensive (e.g., exponentiation in RSA instead of XOR in RC4), public-key implementations of ENCRYPT and DECRYPT are used sparingly. As described in Section 5.6, public-key encryption is used only to encrypt a newly-minted shared-secret key during the set up of a connection between a sender and a receiver, and then that secret-secret key is used for shared-secret encryption of further communication between the sender and the receiver. For example, SSL/TLS, which is described in Section 5.11, uses this approach.

The basic idea, which needs refinement to be secure, for implementing ENCRYPT and DECRYPT using RSA is as follows. Split the message $M$ into fixed size blocks $P$ so that the value of $P$ is smaller than $n$, then ENCRYPT raises $P$ to the public exponent ($d$). DECRYPT raises the encrypted block to the private exponent ($e$). This order is exactly the opposite of the one for SIGN; SIGN raises to the private exponent and VERIFY raises to the public exponent.

That the order is the opposite doesn't matter because RSA is reversible. Since $(M^d)^e = (M^e)^d = M^{ed} (\text{modulo} \,\, n)$, one can raise to the public exponent ($e$) first, and raise to the private exponent ($d$) second, or vice versa, and either way get $M$ back. It is claimed that the security of RSA is equally good both ways.

This basic implementation is relatively weak; there are a number of well-known attacks if the RSA cipher is used by itself for encrypting. To counter these attacks, ENCRYPT should pad short blocks with independent randomized variables so that the value of $P$ is close to $n$, and then raise the padded $P$ to the public exponent. In addition, ENCRYPT should run the message through what is called an all or nothing transform (AONT). An AONT is a non-secret, reversible transformation of a message that ensures that the receiver must have all of the bits of the transformed message in order to recover any of the bits of the original message. Thus, an adversary cannot launch an attack by just concentrating on individual blocks of the message. Readers should consult the literature to learn what other measures are necessary to obtain a good implementation of ENCRYPT and DECRYPT using RSA.