6.2: PRFs vs PRGs; Variable-Hybrid Proofs

Last updated
Save as PDF

Page ID: 7349

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

In this section we show that a PRG can be used to construct a PRF, and vice-versa. The construction of a PRG from PRF is practical, and is one of the more common ways to obtain a PRG in practice. The construction of a PRF from PRG is more of theoretical interest and does not reflect how PRFs are designed in practice.

Constructing a PRG from a PRF

As promised, a PRF can be used to construct a PRG. The construction is quite natural. For simplicity, suppose we have a PRF \(F:\{0,1\}^{\lambda} \times\{0,1\}^{\lambda} \rightarrow\{0,1\}^{\lambda}(\) i.e., in \(=\) out \(=\lambda)\). We can build a length-doubling PRG in the following way:

Construction \(6.2\) (Counter PRG)

There is nothing particularly special about the inputs \(0 \cdots 00\) and \(0 \cdots 01\) to \(F\). All that matters is that they are distinct. The construction can be extended to easily give more than 2 blocks of output, by treating the input to \(F\) as a simple counter (hence the name of this construction).

The guarantee of a PRF is that when its seed is chosen uniformly and it is invoked on distinct inputs, its outputs look independently uniform. In particular, its output on inputs \(0 \cdots 00\) and \(0 \cdots 01\) are indistinguishable from uniform. Hence, concatenating them gives a string which is indistinguishable from a uniform \(2 \lambda\)-bit string.

That really is all there is to the security of this construction, but unfortunately there is a slight technical issue which makes the security proof more complicated than you might guess. We will have to introduce a new technique of variable hybrids to cope with it.

Claim 6.3

If \(F\) is a secure PRF, then the counter PRG construction \(G\) above is a secure PRG.

Proof

In order to prove that \(G\) is a secure PRG, we must prove that the following libraries are indistinguishable:

During the proof, we are allowed to use the fact that \(F\) is a secure PRF. That is, we can use the fact that the following two libraries are indistinguishable:

The inconvenience in the proof stems from a mismatch of the \(s\) variable in \(\mathcal{L}_{\text {prg-real and }}\) the \(k\) variable in \(\mathcal{L}_{\text {prf-real. }}\) In \(\mathcal{L}_{\text {prg-real, }} s\) is local to the QUERY subroutine. Over the course of an execution, \(s\) will take on many values. Since \(s\) is used as the PRF seed, we must write the calls to \(F\) in terms of the LOOKUP subroutine of \(\mathcal{L}_{\text {prf-real. }}\). But in \(\mathcal{L}_{\text {prf-real }}\) the PRF seed is fixed for the entire execution. In other words, we can only use \(\mathcal{L}_{\text {prf-real }}\) to deal with a single PRF seed at a time, but \(\mathcal{L}_{\text {prg-real }}\) deals with many PRG seeds at a time.

To address this, we will have to apply the security of \(F\) (i.e., replace \(\mathcal{L}_{\text {prf-real }}\) with \(\mathcal{L}_{\text {prf-rand }}\) ) many times during the proof - in fact, once for every call to QUERY made by the calling program. Previous security proofs had a fixed number of hybrid steps (e.g., the proof of Claim \(5.5\) used 7 hybrid libraries to show \(\mathcal{L}_{\text {prg-real }} \approx \mathcal{L}_{\text {hyb-1 }} \approx \cdots \approx \mathcal{L}_{\text {hyb-7 }} \approx\) \(\mathcal{L}_{\text {prg-rand }}\) ). This proof will have a variable number of hybrids that depends on the calling program. Specifically, we will prove

\[\mathcal{L}_{\text {prg-real }}^{G} \approx \mathcal{L}_{\text {hyb-1 }} \approx \ldots \approx \mathcal{L}_{\text {hyb- } q} \approx \mathcal{L}_{\text {prg-rand }}^{G}\]

where \(q\) is the number of times the calling program calls QUERY.

Don’t be overwhelmed by all these hybrids. They all follow a simple pattern. In fact, the \(i\) th hybrid looks like this:

In other words, the hybrid libraries all differ in the value \(i\) that is inserted into the code above. If you’re familiar with C compilers, think of this as adding "#define i 427 " to the top of the code above, to obtain \(\mathcal{L}_{\text {hyb-427 }}\)

First note what happens for extreme choices of \(i\) :

In \(\mathcal{L}_{\text {hyb- } 0}\), the if-branch is never taken (count \(\leqslant 0\) is never true). This library behaves exactly like \(\mathcal{L}_{\mathrm{prg} \text {-real }}^{G}\) by giving PRG outputs on every call to QUERY.
If \(q\) is the total number of times that the calling program calls QUERY, then in \(\mathcal{L}_{\text {hyb- } q \text {, }}\), the if-branch is always taken (count \(\leqslant q\) is always true). This library behaves exactly like \(\mathcal{L}_{\text {prg-rand }}^{G}\) by giving truly uniform output on every call to QUERY.

In general, \(\mathcal{L}_{\mathrm{hyb}-i}\) will respond to the first \(i\) calls to QUERY by giving truly random output. It will respond to all further calls by giving outputs of our PRG construction.

We have argued that \(\mathcal{L}_{\text {prg-real }}^{G} \equiv \mathcal{L}_{\text {hyb-0 }}\) and \(\mathcal{L}_{\text {prg-rand }}^{G} \equiv \mathcal{L}_{\text {hyb-q }}\). To complete the proof, we must show that \(\mathcal{L}_{\mathrm{hyb}-(i-1)} \approx \mathcal{L}_{\mathrm{hyb}-i}\) for all \(i\). The main reason for going to all this trouble of defining so many hybrid libraries is that \(\mathcal{L}_{\text {hyb- }(i-1)}\) and \(\mathcal{L}_{\text {hyb- } i}\) are completely identical except in how they respond to the \(i\) th call to QUERY. This difference involves a single call to the PRG (and hence a single PRF seed), which allows us to apply the security of the PRF.

In more detail, let \(i\) be arbitrary, and consider the following sequence of steps starting with \(\mathcal{L}_{\text {hyb-( }(i-1)}\):

	We have taken \(\mathcal{L}_{\text {hyb- }(i-1)}\) and simply expanded the else-branch (count \(\geqslant i\) ) into two subcases \((\) count \(=i\) and count \(>i\) ). However, both cases lead to the same block of code (apart from a change to a local variable’s name), so the change has no effect on the calling program.
	We have factored out the calls to \(F\) that use seed \(s^{*}\) (corresponding to the count \(=i\) case) in terms of \(\mathcal{L}_{\text {prf-real. This }}\) change no effect on the calling program.
	From the fact that \(F\) is a secure PFR, we can replace \(\mathcal{L}^{F}_{\text {prf-real}}\) with \(\mathcal{L}^{F}_{\text {prf-rand}}\), and the overall change is indistinguishable.
	Since count \(=i\) happens only once, only two calls to LOOKUP will be made across the entire lifetime of the library, and they are on distinct inputs. Therefore, the if-branch in LOOKUP will always be taken, and \(T\) is never needed (it is only needed to "remember" values and give the same answer when the same \(x\) is used twice as argument to LOOKUP). Simplifying the library therefore has no effect on the calling program:
	Inlining the subroutine has no effect on the calling program. The resulting library responds with uniformly random output to the first \(i\) calls to QUERY, and responds with outputs of our PRG \(G\) to the others. Hence, this library has identical behavior to \(\mathcal{L}_{\text {hyb-i }}\)

We showed that \(\mathcal{L}_{\mathrm{hyb}-(i-1)} \approx \mathcal{L}_{\mathrm{hyb}-i}\), and therefore: \[\mathcal{L}_{\text {prg-real }}^{G} \equiv \mathcal{L}_{\text {hyb-0 }} \approx \mathcal{L}_{\text {hyb-1 }} \approx \ldots \approx \mathcal{L}_{\text {hyb-q }} \equiv \mathcal{L}_{\text {prg-rand }}^{G}\] This shows that \(\mathcal{L}_{\mathrm{prg}-\mathrm{real}}^{G} \approx \mathcal{L}_{\mathrm{prg} \text {-rand }}^{G}\), so \(G\) is a secure \(\mathrm{PRG}\)

\(\star\) A Theoretical Construction of a PRF from a PRG

We have already seen that it is possible to feed the output of a PRG back into the PRG again, to extend its stretch (Claim 5.7). This is done by making a long chain (like a linked list) of PRGs. The trick to constructing a PRF from a PRG is to chain PRGs together in a binary tree (similar to Exercise \(5.8(\mathrm{a}))\). The leaves of the tree correspond to final outputs of the PRF. If we want a PRF with an exponentially large domain (e.g., in \(=\lambda)\), the binary tree itself is exponentially large! However, it is still possible to compute any individual leaf efficiently by simply traversing the tree from root to leaf. This tree traversal itself is the PRF algorithm. This construction of a PRF is due to Goldreich, Goldwasser, and Micali, in the paper that defined the concept of a PRF.

Imagine a complete binary tree of height in (in will be the input length of the PRF). Every node in this tree has a position which can be written as a binary string. Think of a node’s position as the directions to get there starting at the root, where a 0 means "go left" and 1 means "go right." For example, the root has position \(\epsilon\) (the empty string), the right child of the root has position 1 , etc.

The PRF construction works by assigning a label to every node in the tree, using the a length-doubling PRG \(G:\{0,1\}^{\lambda} \rightarrow\{0,1\}^{2 \lambda}\). For convenience, we will write \(G_{L}(k)\) and \(G_{R}(k)\) to denote the first \(\lambda\) bits and last \(\lambda\) bits of \(G(k)\), respectively. Labels in the tree are \(\lambda\)-bit strings, computed according to the following two rules:

The root node’s label is the PRF seed.
If the node at position \(p\) has label \(v\), then its left child (at position \(p \| \theta\) ) gets label \(G_{L}(v)\), and its right child (at position \(p \| 1\) ) gets label \(G_{R}(v)\).

In the picture above, a node’s label is the string being sent on its incoming edge. The tree has \(2^{\text {in }}\) leaves, whose positions are the strings \(\{0,1\}^{i n}\). We define \(F(k, x)\) to be the label of node/leaf \(x\). To compute this label, we can traverse the tree from root to leaf, taking left and right turns at each node according to the bits of \(x\) and computing the labels along that path according to the labeling rule. In the picture above, the highlighted path corresponds to the computation of \(F(k, 1001 \cdots)\).

It is important to remember that the binary tree is a useful conceptual tool, but it is exponentially large in general. Running the PRF on some input does not involve computing labels for the entire tree, only along a single path from root to leaf.

Construction \(6.4\) (GGM PRF)

Claim 6.5

If \(G\) is a secure PRG, then Construction \(6.4\) is a secure PRF.

Proof

We prove the claim using a sequence of hybrids. The number of hybrids in this case depends on the input-length parameter in. The hybrids are defined as follows:

The hybrids differ only in their hard-coded value of \(d\). We will show that

\[\mathcal{L}_{\text {prf-real }}^{F} \equiv \mathcal{L}_{\text {hyb-0 }} \approx \mathcal{L}_{\text {hyb-1 }} \approx \cdots \approx \mathcal{L}_{\text {hyb-in }} \equiv \mathcal{L}_{\text {prf-rand }}^{F}\]

We first start by understanding the behavior of \(\mathcal{L}_{\text {hyb- } d}\) for extreme choices of \(d\). Simplifications to the code are shown on the right.

In \(\mathcal{L}_{\text {hyb-0 }}\), we always have \(p=\epsilon\), so the only entry of \(T\) that is accessed is \(T[\epsilon]\). Then renaming \(T[\epsilon]\) to \(k\), we see that \(\mathcal{L}_{\text {hyb- } 0} \equiv\) \(\mathcal{L}_{\text {prf-real }}^{F}\). The only difference is when the PRF seed \(k(T[\epsilon])\) is sampled: eagerly at initialization time in \(\mathcal{L}_{\text {prf-real }}^{F}\) vs. at the last possible minute in \(\mathcal{L}_{\text {hyb- } 0}\).

In \(\mathcal{L}_{\text {hyb-in, we always have } p=}\) \(x\) and the body of the forloop is always unreachable. In that case, it is easy to see that \(\mathcal{L}_{\text {hyb-in }}\) has identical behavior to \(\mathcal{L}_{\text {prf-rand }}^{F}\)

The general pattern is that \(\mathcal{L}_{\text {hyb- } d}\) "chops off" the top \(d\) levels of the conceptual binary tree. When computing the output for some string \(x\), we don’t start traversing the tree from the root but rather \(d\) levels down the tree, at the node whose position is the \(d\)-bit prefix of \(x\) (called \(p\) in the library). We initialize the label of this node as a uniform value (unless it has already been defined), and then continue the traversal to the leaf \(x\).

To finish the proof, we show that \(\mathcal{L}_{\text {hyb- }(d-1)}\) and \(\mathcal{L}_{\text {hyb- } d}\) are indistinguishable:

	The library that is shown here is different from \(\mathcal{L}_{\text {hyb- }(d-1)}\) in the highlighted parts. However, these dif ferences have no effect on the calling program. The library here advances \(d-1\) levels down the tree (to the node at location \(p\) ), initializes that node’s label as a uniform value, then computes the labels for both its children, and finally continues computing labels toward the leaf. The only significant difference from \(\left.\mathcal{L}_{\mathrm{hyb}-(} d-1\right)\) is that it computes the labels of both of \(p\) ’s children, even though only one is on the path to \(x\). Since it computes the label correctly, though, it makes no difference when (or if) this extra label is computed.
	We have factored out the body of the if-statement in terms of \(\mathcal{L}_{\mathrm{prg}-\mathrm{real}}^{G}\) since it involves an call to \(G\) on uniform input. Importantly, the seed to \(G\) (called \(T[p]\) in the previous hybrid) was not used anywhere else - it was a string of length \(d-1\) while the library only reads \(T\left[p^{\prime}\right]\) for \(p^{\prime}\) of length \(d\). The change has no effect on the calling program.
	We have applied the security of \(G\) and replaced \(\mathcal{L}_{\text {prg-real with }}\) \(\mathcal{L}_{\text {prg-rand}}\). The change is indistinguishable. Hence, \(F\) is a secure PRF.
	We have inlined \(\mathcal{L}_{\text {prg-rand}}\) and split the sampling of \(2\lambda \) bits into two separate statements sampling \(\lambda\) bits each. In this library, we advance \(d\) levels down the tree, assign a uniform label to a node (and its sibling), and then proceed to the leaf applying \(G\) as usual. The only difference between this library and \(\mathcal{L}_{\text {hyb-d}}\) is that we sample the label of a node that is not on our direct path. But since we sample it uniformly, it doesn’t matter when (or if) that extra value is sampled. Hence, this library has identical behavior to \(\mathcal{L}_{\text {hyb-d}}\).

We showed that \(\mathcal{L}_{\text {hyb-}(d-1)}\approx\mathcal{L}_{\text {hyb-}d}\). Putting everything together, we have:

Hence, \(F\) is a secure PRF.