10.2: A PRF is a MAC

Last updated
Save as PDF

Page ID: 7363

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

The definition of a PRF says (more or less) that even if you’ve seen the output of the PRF on several chosen inputs, all other outputs look independently & uniformly random. Furthermore, uniformly chosen values are hard to guess, as long as they are sufficiently long (e.g., \(\lambda\) bits).

In other words, after seeing some outputs of a PRF, any other PRF output will be hard to guess. This is exactly the intuitive property we require from a MAC. And indeed, we will prove in this section that a PRF is a secure MAC. While the claim makes intuitive sense, proving it formally is a little tedious. This is due to the fact that that in the MAC security game, the adversary can make many verification queries CHECKTAG \((m, t)\) before asking to see the correct MAC of \(m\). Dealing with this event is the source of all the technical difficulty in the proof.

We start with a technical claim that captures the idea that "if you can blindly guess at uniformly chosen values and can also ask to see the values, then it is hard to guess a random value before you have seen it."

Claim \(10.3\)

The following two libraries are indistinguishable:

Both libraries maintain an associative array \(T\) whose values are sampled uniformly the first time they are needed. Calling programs can try to guess these values via the guess subroutine, or simply learn them via REVEAL. Note that the calling program can call GUESS \((m, \cdot)\) both before and after calling \(\operatorname{REVEAL}(m)\).

Intuitively, since the values in \(T\) are \(\lambda\) bits long, it should be hard to guess \(T[m]\) before calling ReVEAL \((m)\). That is exactly what we formalize in \(\mathcal{L}_{\text {guess-R. }}\) In fact, this library doesn’t bother to even choose \(T[m]\) until \(\operatorname{ReVEAL}(m)\) is called. All calls to GUESs \((m, \cdot)\) made before the first call to \(\operatorname{REVEAL}(m)\) will return false.

Proof

Let \(q\) be the number of queries that the calling program makes to GUESs. We will show that the libraries are indistinguishable with a hybrid sequence of the form:

\[\mathcal{L}_{\text {guess-L }} \equiv \mathcal{L}_{\text {hyb-0 }} \approx \mathcal{L}_{\text {hyb-1 }} \approx \ldots \approx \mathcal{L}_{\text {hyb- }} \equiv \mathcal{L}_{\text {guess-R }}\]

The \(h\) th hybrid library in the sequence is defined as:

This hybrid library behaves like \(\mathcal{L}_{\text {guess-R }}\) for the first \(h\) queries to GUESs, in the sense that it will always just return falsewhen \(T[m]\) is undefined. After \(h\) queries, it will behave like \(\mathcal{L}_{\text {guess-L}}\) by actually sampling \(T[m]\) in these cases.

In \(\mathcal{L}_{\text {hyb- } 0}\), the clause "count \(>0 "\) is always true so this clause can be removed from the if-condition. This modification results in \(\mathcal{L}_{\text {guess-L }}\), so we have \(\mathcal{L}_{\text {guess-L }} \equiv \mathcal{L}_{\text {hyb- } 0}\).

In \(\mathcal{L}_{\text {hyb-q }}\), the clause "count \(>q "\) in the if-statement is always false since the calling program makes only \(q\) queries. Removing the unreachable if-statement it results in \(\mathcal{L}_{\text {guess-R, }}\), so we have \(\mathcal{L}_{\text {guess }-\mathrm{R}} \equiv \mathcal{L}_{\text {hyb-q }} .\)

It remains to show that \(\mathcal{L}_{\text {hyb- } h} \approx \mathcal{L}_{\text {hyb- }(h+1)}\) for all \(h\). We can do so by rewriting these two libraries as follows:

The library on the left is equivalent to \(\mathcal{L}_{\text {hyb- } h}\) since the only change is the highlighted lines, which don’t actually affect anything. In the library on the right, if \(T[m]\) is undefined during the first \(h+1\) calls to GUESs, the subroutine will return false (either by avoiding the if-statement altogether or by triggering the highlighted lines). This matches the behavior of \(\mathcal{L}_{\mathrm{hyb}-(h+1)}\), except that the library shown above samples the value \(T[m]\) which in \(\left.\mathcal{L}_{\text {hyb- }(} h+1\right)\) would not be sampled until the next call of the form GUESS \((m, \cdot)\) or \(\operatorname{ReVeAL}(m) .\) But the method of sampling is the same, only the timing is different. This difference has no effect on the calling program.

So the two libraries above are indeed equivalent to \(\mathcal{L}_{\text {hyb- } h}\) and \(\mathcal{L}_{\text {hyb- }(h+1)}\). They differ only in code that is reachable when bad \(=1\). From Lemma \(4.8\), we know that these two libraries are indistinguishable if \(\operatorname{Pr}[\mathrm{bad}=1]\) is negligible. In these libraries there is only one chance to set bad \(=1\), and that is by guessing/predicting uniform \(T[m]\) on the \((h+1)\) th call to GUESs. This happens with probability \(1 / 2^{\lambda}\), which is indeed negligible.

This shows that \(\mathcal{L}_{\text {hyb- } h} \approx \mathcal{L}_{\mathrm{hyb}-(h+1)}\), and completes the proof.

We now return to the problem of proving that a PRF is a MAC.

Claim 10.4

Let \(F\) be a secure PRF with input length in and output length out \(=\lambda\). Then the scheme \(\operatorname{MAC}(k, m)=F(k, m)\) is a secure \(M A C\) for message space \(\{0,1\}^{i n} .\)

Proof

We show that \(\mathcal{L}_{\text {mac-real }}^{F} \approx \mathcal{L}_{\text {mac-fake }}^{F}\), using a standard sequence of hybrids.

	The starting point is the \(\mathcal{L}_{\text {mac-real }}\) library, with the details of this MAC scheme filled in.
	We have factored out the PRF operations in terms of the library \(\mathcal{L}_{\text {prf-real }}\) from the PRF security definition.
	We have applied the PRF-security of \(F\) and replaced \(\mathcal{L}_{\text {prf-real }}\) with \(\mathcal{L}_{\text {prf-rand }}\).
	We can express the previous hybrid in terms of the \(\mathcal{L}_{\text {guess-L}}\) library from Claim \(10.3\). The change has no effect on the calling program.
	We have applied Claim \(10.3\) to replace \(\mathcal{L}_{\text {guess-L }}\) with \(\mathcal{L}_{\text {guess-R}}\). This involves simply removing the if-statement from GUESS. As a result, GUESS \((m, g)\) will return \(\mathrm{false}\) if \(T[\mathrm{~m}]\) is undefined.
	Extra bookkeeping information is added, but not used anywhere. There is no effect on the calling program.

Consider the hybrid experiment above, and suppose the calling program makes a call to CHECKTAG \((m, t)\). There are two cases:

Case 1: there are a previous call to GETTAG(m). In this case, the value \(T(m)\) is defined in \(\mathcal{L}_{\text {guess-R}}\) and \(\left ( m,T\left [ m \right ] \right )\) already exists in \(\mathcal{T}\). In this case, the result of GUESS\((m,t)\) (and hence, of CHECKTAG\((m, t)\) will be \(t\overset{?}{=}T\left [ m \right ]\).
Case 2: there was no previous call to GETTAG(m), then there is no value of the form \(\left ( m,\bigstar \right )\) in \(\mathcal{T}\). Furthermore, \(T(m)\) is undefined in \(\mathcal{L}_{\text {guess-R}}\). the call to GUESS\((m,t)\) will return false, and so will the call to \(\operatorname{CHECKTAG}(m, t)\) that we consider.

In both cases, the result of CHECKTAG \((m, t)\) is true if and only if \((m, t) \in \mathcal{T}\).

	We have modified CHECKTAG according to the discussion above.
	In the previous hybrid, the GUESS subroutine is never called. Removing that unused subroutine and renaming REVEAL to LOOKUP results in the \(\mathcal{L}_{\text {prf-ideal }}\) library from the PRF security definition.
	We have applied the PRF security of \(F\) again, replacing \(\mathcal{L}_{\text {prf-ideal }}\) with \(\mathcal{L}_{\text {prf-real }}\)

Inlining \(\mathcal{L}_{\text {prf-real }}\) in the final hybrid, we see that the result is exactly \(\mathcal{L}_{\text {mac-fake }}^{F}\). Hence, we have shown that \(\mathcal{L}_{\text {mac-real }}^{F} \approx \mathcal{L}_{\text {mac-fake }}^{F}\), which completes the proof.

Discussion

If PRFs are MACs, why do we even need a definition for MACs? The simplest answer to this question is that the concepts of PRF and MAC are indeed different:

Not every PRF is a MAC. Only sufficiently long random values are hard to guess, so only PRFs with long outputs \((o u t \geqslant \lambda)\) are MACs. It is perfectly reasonable to consider a PRF with short outputs.
Not every MAC is a PRF. Just like not every encryption scheme has pseudorandom ciphertexts, not every MAC scheme has pseudorandom tags. Imagine taking a secure MAC scheme and modifying it as \(\operatorname{MAC}^{\prime}(k, m)=\operatorname{MAC}(k, m) \| 0^{\lambda}\). Adding 0 s to every tag prevents the tags from looking pseudorandom, but does not make the tags any easier to guess. Something doesn’t have to be uniformly random in order to be hard to guess.

It is true that in the vast majority of cases we will encounter MAC schemes with random tags, and PRFs with long outputs \((o u t \geqslant \lambda)\). But it is good practice to know whether you really need something that is pseudorandom or hard to guess.