8.4: Padding and Ciphertext Stealing

Last updated
Save as PDF

Page ID: 7386

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

So far we have assumed that all plaintexts are exact multiples of the blocklength. But data in the real world is not always so accommodating. How are block ciphers used in practice with data that has arbitrary length?

Padding

Padding just refers to any approach to encode arbitrary-length data into data that is a multiple of the blocklength. The only requirement is that this encoding is reversible. More formally, a padding scheme should consist of two algorithms:

pad: takes as input a string of any length, and outputs a string whose length is a multiple of the blocklength
unpad: the inverse of pad. We require that unpad \((\operatorname{pad}(x))=x\) for all strings \(x\).

The idea is that the sender can encrypt pad \((x)\), which is guaranteed to be a multiple of the blocklength; the receiver can decrypt and run unpad on the result to obtain \(x\).

In the real world, data almost always comes in bytes and not bits, so that will be our assumption here. In this section we will write bytes in hex, for example \(8 f\). Typical blocklengths are 128 bits (16 bytes) or 256 bits (32 bytes).

Here are a few common approaches for padding:

Null padding: The simplest padding approach is to just fill the final block with null bytes \((00\) ). The problem with this approach is that it is not always reversible. For example, pad( 314159 ) and pad( \(314159 \quad 00)\) will give the same result. It is not possible to distinguish between a null byte that was added for padding and one that was intentionally the last byte of the data.

ANSI X.923 standard: Data is padded with null bytes, except for the last byte of padding which indicates how many padding bytes there are. In essence, the last byte of the padded message tells the receiver how many bytes to remove to recover the original message.

Note that in this padding scheme (and indeed in all of them), if the original unpadded data is already a multiple of the block length, then an entire extra block of padding must be added. This is necessary because it is possible for the original data to end with some bytes that look like valid padding (e.g., 000003 ), and we do not want these bytes to be removed erroneously.

Example

Below are some examples of valid and invalid X.923 padding (using 16-byte blocks):

PKCS#7 standard: If \(b\) bytes of padding are needed, then the data is padded not with null bytes but with \(b\) bytes. Again, the last byte of the padded message tells the receiver how many bytes to remove.

Example

Below are some examples of valid and invalid PKCS#7 padding (using 16-byte blocks):

ISO/IEC 7816-4 standard: The data is padded with a 80 byte followed by null bytes. To remove the padding, remove all trailing null bytes and ensure that the last byte is 80 (and then remove it too).

The significance of 80 is clearer when you write it in binary as 10000000 . So another way to describe this padding scheme is: append a 1 bit, and then pad with 0 bits until reaching the blocklength. To remove the padding, remove all trailing 0 bits as well as the rightmost 1 bit. Hence, this approach generalizes easily to padding data that is not a multiple of a byte.

Example

Below are some examples of valid and invalid ISO/IEC 7816-4 padding (using 16-byte blocks):

The choice of padding scheme is not terribly important, and any of these is generally fine. Just remember that padding schemes are not a security feature! Padding is a public method for encoding data, and it does not involve any secret keys. The only purpose of padding is to enable functionality - using block cipher modes like CBC with data that is not a multiple of the block length.

Furthermore, as we will see in the next chapter, padding is associated with certain attacks against improper use of encryption. Even though this is not really the fault of the padding (rather, it is the fault of using the wrong flavor of encryption), it is such a common pitfall that it is always worth considering in a discussion about padding.

Ciphertext Stealing

Another approach with a provocative name is ciphertext stealing (CTS, if you are not yet tired of three-leter acronyms), which results in ciphertexts that are not a multiple of the blocklength. The main idea behind ciphertext stealing is to use a standard block-cipher mode that only supports full blocks (e.g., CBC mode), and then simply throw away some bits of the ciphertext, in such a way that decryption is still possible. If the last plaintext blocks is \(j\) bits short of being a full block, it is generally possible to throw away \(j\) bits of the ciphertext. In this way, a plaintext of \(n\) bits will be encrypted to a ciphertext of \(b l e n+n\) bits, where blen is the length of the extra IV block.

As an example, let’s see ciphertext stealing as applied to CBC mode. Suppose the blocklength is blen and the last plaintext block \(m_{\ell}\) is \(j\) bits short of being a full block. We start by extending \(m_{\ell}\) with \(j\) zeroes (i.e., null-padding the plaintext) and performing CBC encryption as usual.

Now our goal is to identify \(j\) bits of the CBC ciphertext that can be thrown away while still making decryption possible. In this case, the appropriate bits to throw away are the last \(j\) bits of \(c_{\ell-1}\) (the next-to-last block of the \(C B C\) ciphertext). The reason is illustrated in the figure below:

Suppose the receiver obtains this CBC ciphertext but the last \(j\) bits of \(c_{\ell-1}\) have been deleted. How can he/she decrypt? The important idea is that those missing \(j\) bits were redundant, because there is another way to compute them.

In \(\mathrm{CBC}\) encryption, the last value given as input into the block cipher is \(c_{\ell-1} \oplus m_{\ell}\). Let us give this value a name \(x^{*}:=c_{\ell-1} \oplus m_{\ell}\). Since the last \(j\) bits of \(m_{\ell}\) are \(0^{\prime}\) s, \({ }^{3}\) the last \(j\) bits of \(x^{*}\) are the last \(j\) bits of \(c_{\ell-1}-\) the missing bits. Even though these bits are missing from \(c_{\ell-1}\), the receiver has a different way of computing them as \(x^{*}:=F^{-1}\left(k, c_{\ell}\right)\).

Putting it all together, the receiver does the following: First, it observes that the ciphertext is \(j\) bits short of a full block. It computes \(F^{-1}\left(k, c_{\ell}\right)\) and takes the last \(j\) bits of this value to be the missing bits from \(c_{\ell-1}\). With the missing bits recovered, the receiver does \(\mathrm{CBC}\) decryption as usual. The result is a plaintext consisting of \(\ell\) full blocks, but we know that the last \(j\) bits of that plaintext are 0 padding that the receiver can remove.

It is convenient in an implementation for the boundaries between blocks to be in predictable places. For that reason, it is slightly awkward to remove \(j\) bits from the middle of the ciphertext during encryption (or add them during decryption), as we have done here. So in practice, the last two blocks of the ciphertext are often interchanged. In the example above, the resulting ciphertext (after ciphertext stealing) would be:

\(c_{0}\left\|c_{1}\right\| c_{2} \cdots c_{\ell-3}\left\|c_{\ell-2}\right\| c_{\ell} \| c_{\ell-1}^{\prime}\), where \(c_{\ell-1}^{\prime}\) is the first blen \(-j\) bits of \(c_{\ell-1} .\)

That way, all ciphertext blocks except the last one are the full blen bits long, and the boundaries between blocks appear consistently every blen bits. This "optimization" does add some significant edge cases to any implementation. One must also decide what to do when the plaintext is already an exact multiple of the blocklength - should the final two ciphertext blocks be swapped even in this case? Below we present an implementation of ciphertext stealing (CTS) that does not swap the final two blocks in this case. This means that it collapses to plain CBC mode when the plaintext is an exact multiple of the block length.

Construction \(8.6\) (CBC-CTS)

The marked lines correspond to plain \(C B C\) mode.

\({ }^{3}\) The receiver knows this fact, because the ciphertext is \(j\) bits short of a full block. The length of the (shortened) ciphertext is a signal about how many 0 -bits of padding were used during encryption.