# 8.2: CPA Security and Variable-Length Plaintexts

- Page ID
- 7384

Here’s a big surprise: none of these block cipher modes achieve CPA security, or at least CPA security as we have been defining it.

Consider a block cipher with blen \(=\lambda\), used in \(C B C\) mode. As you will see, there is nothing particularly specific to \(C B C\) mode, and the same observations apply to the other modes.

In \(C B C\) mode, a plaintext consisting of \(\ell\) blocks is encrypted into a ciphertext of \(l+1\) blocks. In other words, the ciphertext **leaks the number of blocks in the plaintext**. We can leverage this observation into the following attack:

The distinguisher chooses a 1-block plaintext and a 2-block plaintext. If this distinguisher is linked to \(\mathcal{L}_{\mathrm{cpa}-\mathrm{L}}\), the 1-block plaintext is encrypted and the resulting ciphertext is 2 blocks (2 \(\lambda\) bits) long. If the distinguisher is linked to \(\mathcal{L}_{\mathrm{cpa}-\mathrm{R}}\), the 2-block plaintext is encrypted and the resulting ciphertext is 3 blocks ( \(3 \lambda\) bits) long. By simply checking the length of the ciphertext, this distinguisher can tell the difference and achieve advantage \(1.\)

So, technically speaking, these block cipher modes do not provide CPA security, since ciphertexts leak the length (measured in blocks) of the plaintext. But suppose we don’t really care about hiding the length of plaintexts.\({ }^{1}\) Is there a way to make a security definition that says: **ciphertexts hide everything about the plaintext, except their length?**

It is clear from the previous example that a distinguisher can successfully distinguish the CPA libraries if it makes a query EAVEsDrop \(\left(m_{L}, m_{R}\right)\) with \(\left|m_{L}\right| \neq\left|m_{R}\right|\). A simple way to change the CPA security definition is to just disallow this kind of query. The libraries will give an error message if \(\left|m_{L}\right| \neq\left|m_{R}\right|\). This would allow the adversary to make the challenge plaintexts differ in any way of his/her choice, except in their length. It doesn’t really matter whether \(|m|\) refers to the length of the plaintext in bits or in blocks - whichever makes the most sense for a particular scheme.

From now on, when discussing encryption schemes that support variable-length plaintexts, CPA security will refer to the following updated libraries:

In the definition of CPA$ security (pseudorandom ciphertexts), the \(\mathcal{L}_{\text {cpa\$-rand }}\) library responds to queries with uniform responses. Since these responses must look like ciphertexts, they must have the appropriate length. For example, for the modes discussed in this chapter, an \(\ell\)-block plaintext is expected to be encrypted to an \((\ell+1)\)-block ciphertext. So, based on the length of the plaintext that is provided, the library must choose the appropriate ciphertext length. We are already using \(\sum . C\) to denote the set of possible ciphertexts of an encryption scheme \(\sum\). So let’s extend the notation slightly and write \(\sum . C(\ell)\) to denote the set of possible ciphertexts for plaintexts of length \(\ell\). Then when discussing encryption schemes supporting variable-length plaintexts, CPA$ security will refer to the following libraries:

Note that the \(\mathcal{L}_{\text {cpa\$-rand }}\) library does not use any information about \(m\) other than its length. This again reflects the idea that ciphertexts leak nothing about plaintexts other than their length.

In the exercises, you are asked to prove that, with respect to these updated security definitions, CPA$ security implies CPA security as before.

## Don’t Take Length-Leaking for Granted!

We have just gone from requiring encryption to leak no partial information to casually allowing some specific information to leak. Let us not be too hasty about this!

If we want to truly support plaintexts of arbitrary length, then leaking the length is in fact unavoidable. But "unavoidable" doesn’t mean "free of consequences." By observing only the length of encrypted network traffic, many serious attacks are possible. Here are several examples:

- When accessing Google maps, your browser receives many image tiles that comprise the map that you see. Each image tile has the same pixel dimensions. However, they are compressed to save resources, and not all images compress as significantly as others. Every region of the world has its own rather unique "fingerprint" of imagetile lengths. So even though traffic to and from Google maps is encrypted, the sizes of the image tiles are leaked. This can indeed be used to determine the region for which a user is requesting a map. \({ }^{2}\) The same idea applies to auto-complete suggestions in a search form.
- Variable-bit-rate (VBR) encoding is a common technique in audio/video encoding. When the data stream is carrying less information (e.g., a scene with a fixed camera position, or a quiet section of audio), it is encoded at a lower bit rate, meaning that each unit of time is encoded in fewer bits. In an encrypted video stream, the changes in bit rate are reflected as changes in packet length. When a user is watching a movie on Netflix or a Youtube video (as opposed to a live event stream), the bit-rate changes are consistent and predictable. It is therefore rather straight-forward to determine which video is being watched, even on an encrypted connection, just by observing the packet lengths.
- VBR is also used in many encrypted voice chat programs. Attacks on these tools have been increasing in sophistication. The first attacks on encrypted voice programs showed how to identify who was speaking (from a set of candidates), just by observing the stream of ciphertext sizes. Later attacks could determine the language being spoken. Eventually, when combined with sophisticated linguistic models, it was shown possible to even identify individual words to some extent!

It’s worth emphasizing again that none of these attacks involve any attempt to break the encryption. The attacks rely solely on the fact that encryption leaks the length of the plaintexts.

\({ }^{1}\) Indeed, hiding the length of communication (in the extreme, hiding the existence of communication) is a very hard problem.

\({ }^{2}\) http://blog.ioactive.com/2012/02/ssl-traffic-analysis-on-google-maps.html