17.5: The Law of Total Probability

Last updated
Save as PDF

Page ID: 48426

Eric Lehman, F. Thomson Leighton, & Alberty R. Meyer
Google and Massachusetts Institute of Technology via MIT OpenCourseWare

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\dsum}{\displaystyle\sum\limits} \)

\( \newcommand{\dint}{\displaystyle\int\limits} \)

\( \newcommand{\dlim}{\displaystyle\lim\limits} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\(\newcommand{\longvect}{\overrightarrow}\)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

Breaking a probability calculation into cases simplifies many problems. The idea is to calculate the probability of an event \(A\) by splitting into two cases based on whether or not another event E occurs. That is, calculate the probability of \(A \cap E\) and \(A \cap \overline{E}\). By the Sum Rule, the sum of these probabilities equals \(\text{Pr}[A]\). Expressing the intersection probabilities as conditional probabilities yields:

Rule 17.5.1 (Law of Total Probability: single event).

\[\nonumber \text{Pr}[A] = \text{Pr}[A \mid E] \cdot \text{Pr}[E] + \text{Pr}[A \mid \overline{E}] \cdot \text{Pr}[\overline{E}].\]

For example, suppose we conduct the following experiment. First, we flip a fair coin. If heads comes up, then we roll one die and take the result. If tails comes up, then we roll two dice and take the sum of the two results. What is the probability that this process yields a 2? Let \(E\) be the event that the coin comes up heads, and let \(A\) be the event that we get a 2 overall. Assuming that the coin is fair, \(\text{Pr}[E] = \text{Pr}[\overline{E}] = 1/2.\) There are now two cases. If we flip heads, then we roll a 2 on a single die with probability \(\text{Pr}[A \mid E] = 1/6\). On the other hand, if we flip tails, then we get a sum of 2 on two dice with probability \(\text{Pr}[A \mid \overline{E}] = 1/36\). Therefore, the probability that the whole process yields a 2 is

\[\nonumber \text{Pr}[A] = \frac{1}{2} \cdot \frac{1}{6} + \frac{1}{2} \cdot \frac{1}{36} = \frac{7}{72}.\]

This rule extends to any set of disjoint events that make up the entire sample space. For example,

Rule (Law of Total Probability: 3-events). If \(E_1, E_2\), and \(E_3\) are disjoint and \(\text{Pr}[E_1 \cup E_2 \cup E_3] = 1\), then

\[\nonumber \text{Pr}[A] = \text{Pr}[A \mid E_1] \cdot \text{Pr}[E_1] + \text{Pr}[A \mid E_2] \cdot \text{Pr}[E_2] + \text{Pr}[A \mid E_3] \cdot \text{Pr}[E_3].\]

This in turn leads to a three-event version of Bayes’ Rule in which the probability of event \(E_1\) given \(A\) is calculated from the “inverse” conditional probabilities of \(A\) given \(E_1, E_2,\) and \(E_3\):

Rule (Bayes’ Rule: 3-events).

\[\nonumber \text{Pr}(E_1 \mid A) = \frac{\text{Pr}[A \mid E_1] \cdot \text{Pr}[E_1]}{\text{Pr}[A \mid E_1] \cdot \text{Pr}[E_1] + \text{Pr}[A \mid E_2] \cdot \text{Pr}[E_2] + \text{Pr}[A \mid E_3] \cdot \text{Pr}[E_3]}.\]

The generalization of these rules to \(n\) disjoint events is a routine exercise (Problems 17.3 and 17.4).

Conditioning on a Single Event

The probability rules that we derived in Section 16.5.2 extend to probabilities conditioned on the same event. For example, the Inclusion-Exclusion formula for two sets holds when all probabilities are conditioned on an event \(C\):

\[\nonumber \text{Pr}[A \cup B \mid C] = \text{Pr}[A \mid C] + \text{Pr}[B \mid C] - \text{Pr}[A \cap B \mid C].\]

This is easy to verify by plugging in the Definition 17.2.1 of conditional probability.²

It is important not to mix up events before and after the conditioning bar. For example, the following is not a valid identity:

False Claim.

\[\label{17.5.1} \text{Pr}[A \mid B \cup C] = \text{Pr}[A \mid B] + \text{Pr}[A \mid C] - \text{Pr}[A \mid B \cap C].\]

A simple counter-example is to let \(B\) and \(C\) be events over a uniform space with most of their outcomes in \(A\), but not overlapping. This ensures that \(\text{Pr}[A \mid B]\) and \(\text{Pr}[A \mid C]\) are both close to 1. For example,

\[\begin{aligned} B &::= [0..9], \\ C &::= [10..18] \cup \{0\}, \\ A &::= [1..18], \end{aligned}\]

\[\nonumber \text{Pr}[A \mid B] = \frac{9}{10} = \text{Pr}[A \mid C].\]

Also, since 0 is the only outcome in \(B \cap C\) and \(0 \notin A\), we have

\[\nonumber \text{Pr}[A \mid B \cap C] = 0\]

So the right hand side of (\ref{17.5.1}) is 1.8, while the left hand side is a probability which can be at most 1—actually, it is \(18/19\).

²Problem 17.14 explains why this and similar conditional identities follow on general principles from the corresponding unconditional identities.

Search

Text Color

Text Size

Margin Size

Font Type