5.1: Definitions

Last updated
Save as PDF

Page ID: 7341

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

A pseudorandom generator (PRG) is a deterministic function \(G\) whose outputs are longer than its inputs. When the input to \(G\) is chosen uniformly at random, it induces a certain distribution over the possible output. As discussed above, this output distribution cannot be uniform. However, the distribution is pseudorandom if it is indistinguishable from the uniform distribution. More formally:

Definition 5.1 (PRG security)

Let \(G:\{0,1\}^{\lambda} \rightarrow\{0,1\}^{\lambda+\ell}\) be a deterministic function with \(\ell>0\). We say that \(G\) is a secure pseudorandom generator \((P R G)\) if \(\mathcal{L}_{\mathrm{prg} \text {-real }}^{G} \approx \mathcal{L}_{\mathrm{prg} \text {-rand }}^{G}\), where:

The value \(\ell\) is called the stretch of the PRG. The input to the PRG is typically called a seed.

Below is an illustration of the distributions sampled by these libraries, for a length-doubling \((\ell=\lambda)\) PRG (not drawn to scale) :

\(\mathcal{L}_{\text {prg-real samples from distribution of red dots, by first sampling a uniform element of }}\) \(\{0,1\}^{\lambda}\) and performing the action of \(G\) on that value to get a red result in \(\{0,1\}^{2 \lambda}\). The other library \(\mathcal{L}_{\text {prg-rand }}\) directly samples the uniform distribution on \(\{\theta, 1\}^{2 \lambda}\) (in green above).

To understand PRGs, you must simultaneously appreciate two ways to compare the PRG’s output distribution with the uniform distribution:

From a relative perspective, the PRG’s output distribution is tiny. Out of the \(2^{2 \lambda}\) strings in \(\{0,1\}^{2 \lambda}\), only \(2^{\lambda}\) are possible outputs of \(G\). These strings make up a \(2^{\lambda} / 2^{2 \lambda}=1 / 2^{\lambda}\) fraction of \(\{0,1\}^{2 \lambda}-\) a negligible fraction!
From an absolute perspective, the PRG’s output distribution is huge. There are \(2^{\lambda}\) possible outputs of \(G\), which is an exponential amount!

The illustration above only captures the relative perspective (comparing the red dots to the entire extent of \(\{0,1\}^{2 \lambda}\) ), so it can lead to some misunderstanding. Just looking at this picture, it is hard to imagine how the two distributions could be indistinguishable. How could a calling program not notice whether it’s seeing the whole set or just a negligible fraction of the whole set? Well, if you run in polynomial-time in \(\lambda\), then \(2^{\lambda}\) and \(2^{2 \lambda}\) are both so enormous that it doesn’t really matter that one is vastly bigger than the other. The relative sizes of the distribution don’t really help distinguish, since it is not a viable strategy for the distinguisher to "measure" the size of the distribution it’s sampling. Consider: there are about \(2^{75}\) molecules in a teaspoon of water, and about \(2^{2 \cdot 75}\) molecules of water in Earth’s oceans. Suppose you dump a teaspoon of water into the ocean and let things mix for a few thousand years. Even though the teaspoon accounts for only \(1 / 2^{75}\) of the ocean’s contents, that doesn’t make it easy to keep track of all \(2^{75}\) water molecules that originated in the teaspoon! If you are small enough to see individual water molecules, then a teaspoon of water looks as big as the ocean.

Discussion & Pitfalls

Do not confuse the interface of a PRG (it takes in a seed as input) with the interface of the security libraries \(\mathcal{L}_{\text {prg- } \star}\) (their QUERY subroutine doesn’t take any input)! A PRG is indeed an algorithm into which you can feed any string you like. However, security is only guaranteed when the PRG is being used exactly as described in the security libraries - in particular, when the seed is chosen uniformly/secretly and not used for anything else.
Nothing prevents a user from putting an adversarially-chosen \(s\) into a PRG, or revealing a PRG seed to an adversary, etc. You just get no security guarantee from doing it, since it’s not the situation reflected in the PRG security libraries.
It doesn’t really make sense to say that "0010110110 is a random string" or "0000000001 is a pseudorandom string." Randomness and pseudorandomness are properties of the process used to generate a string, not properties of the individual strings themselves. When we have a value \(z=G(s)\) where \(G\) is a PRG and \(s\) is chosen uniformly, you could say that \(z\) was "chosen pseudorandomly" You could say that the output of some process is a "pseudorandom distribution."But it is slightly sloppy (although common) to say that a string \(z\) "is pseudorandom".
There are common statistical tests you can run, which check whether some data has various properties that you would expect from a uniform distribution. \({ }^{1}\) For example, are there roughly an equal number of \(\odot\) s and 1 s? Does the substring 01010 occur with roughly the frequency I would expect? If I interpret the string as a series of points in the unit square \([0,1)^{2}\), is it true that roughly \(\pi / 4\) of them are within Euclidean distance 1 of the origin?
The definition of pseudorandomness is kind of a "master" definition that encompasses all of these statistical tests and more. After all, what is a statistical test, but a polynomial-time procedure that obtains samples from a distribution and outputs a yes/no decision? Pseudorandomness means that every statistical test that "passes" uniform data will also "pass" pseudorandomly generated data.