0.5: Probability

Last updated
Save as PDF

Page ID: 85511

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

Definition \(0.7\) (Distribution)

A (discrete) probability distribution over a set \(X\) of outcomes is usually written as a function "Pr" that associates each outcome \(x \in X\) with a probability \(\operatorname{Pr}[x]\). We often say that the distribution assigns probability \(\operatorname{Pr}[x]\) to outcome \(x\).

For each outcome \(x \in X\), the probability distribution must satisfy the condition \(0 \leqslant\) \(\operatorname{Pr}[x] \leqslant 1\). Additionally, the sum of all probabilities \(\sum_{x \in X} \operatorname{Pr}[x]\) must equal \(1.\)

Definition \(0.8\) (Uniform)

A special distribution is the uniform distribution over a finite set \(X\), in which every \(x \in X\) is assigned probability \(\operatorname{Pr}[x]=1 /|X|\).

We also extend the notation Pr to events, which are collections of outcomes. If you want to be formal, an event \(A\) is any subset of the possible outcomes, and its probability is defined to be \(\operatorname{Pr}[A]=\sum_{x \in A} \operatorname{Pr}[x]\). We always simplify the notation slightly, so instead of writing \(\operatorname{Pr}[\{x \mid x\) satisfies some condition \(\}]\), we write \(\operatorname{Pr}[\) condition].

Example

6-sided die has faces numbered \(\{1,2, \ldots, 6\}\). Tossing the die (at least for a mathematician) induces a uniform distribution over the choice of face. Then \(\operatorname{Pr}[3\) is rolled \(]=1 / 6\), and \(\operatorname{Pr}[\) an odd number is rolled \(]=1 / 2\) and \(\operatorname{Pr}[\) a prime is rolled \(]=1 / 2\).

Tips & Tricks

Knowing one of the probabilities \(\operatorname{Pr}[A]\) and \(\operatorname{Pr}[\neg A]\) (which is "the probability that A doesn’t happen") tells you exactly what the other probability is, via the relationship \[\operatorname{Pr}[A]=1-\operatorname{Pr}[\neg A]\notag\] This is one of the most basic facts about probability, but it can be surprisingly useful since one of \(\operatorname{Pr}[A]\) and \(\operatorname{Pr}[\neg A]\) is often much easier to calculate than the other. If you get stuck trying to come up with an expression for \(\operatorname{Pr}[A]\), try working out an expression for \(\operatorname{Pr}[\neg A]\) instead.

Example

I roll a six-sided die, six times. What is the probability that there is some repeated value? Let’s think about all the ways of getting a repeated value. Well, two of the rolls could be 1, or three of rolls could be 1, or all of them could be 1, two of them could be 1 and the rest could be 2, etc. Oh no, am I double-counting repeated 2 s and repeated 1s? Uhh...

An easier way to attack the problem is to realize that the probability we care about is actually \(1-\operatorname{Pr}[\) all 6 rolls are distinct]. This complementary event (all 6 rolls distinct) happens exactly when the sequence of dice rolls spell out a permutation of \(\{1, \ldots, 6\}\). There are \(6 !=\) 720 such permutations, out of \(6^{6}=46656\) total possible outcomes. Hence, the answer to the question is \[1-\frac{6 !}{6^{6}}=1-\frac{720}{46656}=\frac{45936}{46656} \approx 0.9846 \notag\]

Another trick is one I like to call setting breakpoints on the universe. Imagine stopping the universe at a point where some random choices have happened, and others have not yet happened. This is best illustrated by example:

Example

A classic question asks: when rolling two 6-sided dice what is the probability that the dice match? Here is a standard (and totally correct way) to answer the question:

When rolling two 6-sided dice, there are \(6^{2}=36\) total outcomes ( a pair of numbers), so each has probability 1/36 under a uniform distribution. There are 6 outcomes that make the dice match: both dice 1 , both dice 2, both dice 3, and so on. Therefore, the probability of rolling matching dice is \(6 / 36=1 / 6\).

A different way to arrive at the answer goes like this:

Imagine I roll the dice one after another, and I pause the universe (set a breakpoint) after rolling the first die but before rolling the second one. The universe has already decided the result of the first die, so let’s call that value \(d\). The dice will match only if the second roll comes \(u p d\). Rolling \(d\) on the second die (indeed, rolling any particular value) happens with probability \(1 / 6.\)

This technique of setting breakpoints is simple but powerful and frequently useful. Some other closely related tricks are: (1) postponing a random choice until the last possible moment, just before its result is used for the first time, and (2) switching the relative order of independent random choices.

Precise Terminology

It is tempting in this course to say things like " \(x\) is a random string." But a statement like this is sloppy on several accounts.

First, is 42 a random number? Is "heads" a random coin? What is even being asked by these questions? Being "random" is not a property of an outcome (like a number or a side of a coin) but a property of the process that generates an outcome. \({ }^{3}\) Instead of saying " \(x\) is a random string" it’s much more precise to say " \(x\) was chosen randomly"

Second, usually when we use the word "random" we don’t mean any old probability distribution. We usually mean to refer to the uniform distribution. Instead of saying " \(x\) was chosen randomly" it’s much more precise to say " \(x\) was chosen uniformly" (assuming that really is what you mean).

Every cryptographer I know (yes, even your dear author) says things like " \(x\) is a random string" all the time to mean " \(x\) was chosen uniformly [from some set of strings]." Usually the meaning is clear from context, at least to the other cryptographers in the room. But all of us could benefit by having better habits about this sloppy language. Students especially will benefit by internalizing the fact that randomness is a property of the process, not of the individual outcome.

\({ }^{3}\) There is something called Kolmogorov complexity that can actually give coherent meaning to statements like " \(x\) is a random string." But Kolmogorov complexity has no relevance to this book. The statement " \(x\) is a random string" remains meaningless with respect to the usual probability-distribution sense of the word "random."