Skip to main content
Engineering LibreTexts

16.5: Set Theory and Probability

  • Page ID
    48419
    • Eric Lehman, F. Thomson Leighton, & Alberty R. Meyer
    • Google and Massachusetts Institute of Technology via MIT OpenCourseWare
    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    Let’s abstract what we’ve just done into a general mathematical definition of sample spaces and probability.

    Probability Spaces

    Definition \(\PageIndex{1}\)

    A countable sample space \(S\) is a nonempty countable set.4 An element \(\omega \in S\) is called an outcome. A subset of \(S\) is called an event.

    Definition \(\PageIndex{2}\)

    A probability function on a sample space \(S\) is a total function \(\text{Pr} : S \rightarrow \mathbb{R}\) such that

    • \(\text{Pr}[\omega] \geq 0\) for all \(\omega \in S\), and
    • \(\sum_{\omega \in S} \text{Pr}[\omega] = 1\).

    A sample space together with a probability function is called a probability space. For any event \(E \subseteq S\), the probability of \(E\) is defined to be the sum of the probabilities of the outcomes in \(E\):

    \[\nonumber \text{Pr}[E] ::= \sum_{\omega \in E} \text{Pr}[\omega].\]

    In the previous examples there were only finitely many possible outcomes, but we’ll quickly come to examples that have a countably infinite number of outcomes.

    The study of probability is closely tied to set theory because any set can be a sample space and any subset can be an event. General probability theory deals with uncountable sets like the set of real numbers, but we won’t need these, and sticking to countable sets lets us define the probability of events using sums instead of integrals. It also lets us avoid some distracting technical problems in set theory like the Banach-Tarski “paradox” mentioned in Chapter 7.

    Probability Rules from Set Theory

    Most of the rules and identities that we have developed for finite sets extend very naturally to probability.

    An immediate consequence of the definition of event probability is that for disjoint events \(E\) and \(F\),

    \[\nonumber \text{Pr}[E \cup F] = \text{Pr}[E] + \text{Pr}[F].\]

    This generalizes to a countable number of events:

    Rule 16.5.3 (Sum Rule). If \(E_0, E_1, \ldots, E_n, \ldots\) are pairwise disjoint events, then

    \[\nonumber \text{Pr}\left[\bigcup_{n \in \mathbb{N}} E_n\right] = \sum_{n \in \mathbb{N}} \text{Pr}[E_n].\]

    The Sum Rule lets us analyze a complicated event by breaking it down into simpler cases. For example, if the probability that a randomly chosen MIT student is native to the United States is 60%, to Canada is 5%, and to Mexico is 5%, then the probability that a random MIT student is native to one of these three countries is 70%.

    Another consequence of the Sum Rule is that \(\text{Pr}[A] + \text{Pr}[\overline{A}] = 1\), which follows because \(\text{Pr}[S] = 1\) and \(S\) is the union of the disjoint sets \(A\) and \(\overline{A}\). This equation often comes up in the form:

    \[\nonumber \text{Pr}[\overline{A}] = 1 - \text{Pr}[A]. \qquad \text{(Complement Rule)}\]

    Sometimes the easiest way to compute the probability of an event is to compute the probability of its complement and then apply this formula.

    Some further basic facts about probability parallel facts about cardinalities of finite sets. In particular:

    \[\begin{aligned} \text{Pr}[B-A] &= \text{Pr}[B]-\text{Pr}[A \cap B], & \text{(Difference Rule)} \\ \text{Pr}[A \cup B] &= \text{Pr}[A]+\text{Pr}[B]-\text{Pr}[A \cap B], & \text{(Inclusion-Exclusion)} \\ \text{Pr}[A \cup B] &\leq \text{Pr}[A]+\text{Pr}[B], & \text{(Boole's Inequality)} \\ \text{If } A \subseteq B, &\text{then } \text{Pr}[A] \leq \text{Pr}[B]. & \text{(Monotonicity Rule)} \end{aligned}\]

    The Difference Rule follows from the Sum Rule because \(B\) is the union of the disjoint sets \(B - A\) and \(A \cap B\). Inclusion-Exclusion then follows from the Sum and Difference Rules, because \(A \cup B\) is the union of the disjoint sets \(A\) and \(B - A\). Boole’s inequality is an immediate consequence of Inclusion-Exclusion since probabilities are nonnegative. Monotonicity follows from the definition of event probability and the fact that outcome probabilities are nonnegative.

    The two-event Inclusion-Exclusion equation above generalizes to any finite set of events in the same way as the corresponding Inclusion-Exclusion rule for \(n\) sets. Boole’s inequality also generalizes to both finite and countably infinite sets of events:

    Rule 16.5.4 (Union Bound).

    \[\text{Pr}[E_1 \cup \cdots \cup E_n \cup \cdots] \leq \text{Pr}[E_1] + \cdots + \text{Pr}[E_n] + \cdots.\]

    The Union Bound is useful in many calculations. For example, suppose that \(E_i\) is the event that the \(i\)-th critical component among \(n\) components in a spacecraft fails. Then \(E_1 \cup \cdots \cup E_n\) is the event that some critical component fails. If \(\sum_{i=1}^{n} \text{Pr}[E_i]\) is small, then the Union Bound can provide a reassuringly small upper bound on this overall probability of critical failure.

    Uniform Probability Spaces

    Definition \(\PageIndex{5}\)

    A finite probability space, \(S\), is said to be uniform if \(\text{Pr}[\omega]\) is the same for every outcome \(\omega \in S\).

    As we saw in the strange dice problem, uniform sample spaces are particularly easy to work with. That’s because for any event \(E \subseteq S\),

    \[\label{16.5.2} \text{Pr}[E] = \dfrac{|E|}{|S|}.\]

    This means that once we know the cardinality of \(E\) and \(S\), we can immediately obtain \(\text{Pr}[E]\). That’s great news because we developed lots of tools for computing the cardinality of a set in Part III.

    For example, suppose that you select five cards at random from a standard deck of 52 cards. What is the probability of having a full house? Normally, this question would take some effort to answer. But from the analysis in Section 14.7.2, we know that

    \[\nonumber |S| = {52 \choose 5}\]

    and

    \[\nonumber |E| = 13 \cdot {4 \choose 3} \cdot 12 \cdot {4 \choose 2}\]

    where \(E\) is the event that we have a full house. Since every five-card hand is equally likely, we can apply equation (\ref{16.5.2}) to find that

    \[\begin{aligned} \text{Pr}[E] &= \dfrac{13 \cdot 12 \cdot {4 \choose 3} \cdot {4 \choose 2}}{52 \choose 5} \\ &= \dfrac{13 \cdot 12 \cdot 4 \cdot 6 \cdot 5 \cdot 4 \cdot 3 \cdot 2}{52 \cdot 51 \cdot 50 \cdot 49 \cdot 48} = \dfrac{18}{12495} \\ &\approx \dfrac{1}{694}. \end{aligned}\]

    Infinite Probability Spaces

    Infinite probability spaces are fairly common. For example, two players take turns flipping a fair coin. Whoever flips heads first is declared the winner. What is the probability that the first player wins? A tree diagram for this problem is shown in Figure 16.10.

    clipboard_ed3e21b7717a17bddfc04928fa2b74262.png
    Figure 16.10 The tree diagram for the game where players take turns flipping a fair coin. The first player to flip heads wins.

    The event that the first player wins contains an infinite number of outcomes, but we can still sum their probabilities:

    \[\begin{aligned} \text{Pr}[\text{first player wins}] &=\frac{1}{2} + \frac{1}{8} + \frac{1}{32} + \frac{1}{128} + \cdots \\ &= \frac{1}{2} \sum_{n=0}^{\infty} \left( \frac{1}{4}\right)^n \\ &= \frac{1}{2} \left( \frac{1}{1 - 1/4} \right) = \frac{2}{3} \end{aligned}\]

    Similarly, we can compute the probability that the second player wins:

    \[\nonumber \text{Pr}[\text{second player wins}] =\frac{1}{4} + \frac{1}{16} + \frac{1}{64} + \frac{1}{256} + \cdots = \frac{1}{3}.\]

    In this case, the sample space is the infinite set

    \[\nonumber S ::= \{T^n H \mid n \in \mathbb{N}\},\]

    where \(T^n\) stands for a length \(n\) string of \(T\)’s. The probability function is

    \[\nonumber \text{Pr}[T^n H]::= \frac{1}{2^{n+1}}.\]

    To verify that this is a probability space, we just have to check that all the probabilities are nonnegative and that they sum to 1. The given probabilities are all nonnegative, and applying the formula for the sum of a geometric series, we find that

    \[\nonumber \sum_{n \in \mathbb{N}}\text{Pr}[T^n H]::= \sum_{n \in \mathbb{N}}\frac{1}{2^{n+1}} = 1.\]

    Notice that this model does not have an outcome corresponding to the possibility that both players keep flipping tails forever. (In the diagram, flipping forever corresponds to following the infinite path in the tree without ever reaching a leaf/outcome.) If leaving this possibility out of the model bothers you, you’re welcome to fix it by adding another outcome, \(\omega_{forever}\), to indicate that that’s what happened. Of course since the probabilities of the other outcomes already sum to 1, you have to define the probability of \(\omega_{forever}\) to be 0. Now outcomes with probability zero will have no impact on our calculations, so there’s no harm in adding it in if it makes you happier. On the other hand, in countable probability spaces it isn’t necessary to have outcomes with probability zero, and we will generally ignore them.

    4Yes, sample spaces can be infinite. If you did not read Chapter 7, don’t worry—countable just means that you can list the elements of the sample space as \(\omega_0, \omega_1, \omega_2, \ldots\)


    This page titled 16.5: Set Theory and Probability is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Eric Lehman, F. Thomson Leighton, & Alberty R. Meyer (MIT OpenCourseWare) .

    • Was this article helpful?