18.5: Linearity of Expectation

Last updated
Save as PDF

Page ID: 48435

Eric Lehman, F. Thomson Leighton, & Alberty R. Meyer
Google and Massachusetts Institute of Technology via MIT OpenCourseWare

$ \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } $ $ \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} $$\newcommand{\id}{\mathrm{id}}$ $ \newcommand{\Span}{\mathrm{span}}$ $ \newcommand{\kernel}{\mathrm{null}\,}$ $ \newcommand{\range}{\mathrm{range}\,}$ $ \newcommand{\RealPart}{\mathrm{Re}}$ $ \newcommand{\ImaginaryPart}{\mathrm{Im}}$ $ \newcommand{\Argument}{\mathrm{Arg}}$ $ \newcommand{\norm}[1]{\| #1 \|}$ $ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$ $ \newcommand{\Span}{\mathrm{span}}$ $\newcommand{\id}{\mathrm{id}}$ $ \newcommand{\Span}{\mathrm{span}}$ $ \newcommand{\kernel}{\mathrm{null}\,}$ $ \newcommand{\range}{\mathrm{range}\,}$ $ \newcommand{\RealPart}{\mathrm{Re}}$ $ \newcommand{\ImaginaryPart}{\mathrm{Im}}$ $ \newcommand{\Argument}{\mathrm{Arg}}$ $ \newcommand{\norm}[1]{\| #1 \|}$ $ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$ $ \newcommand{\Span}{\mathrm{span}}$$\newcommand{\AA}{\unicode[.8,0]{x212B}}$

Expected values obey a simple, very helpful rule called Linearity of Expectation. Its simplest form says that the expected value of a sum of random variables is the sum of the expected values of the variables.

Theorem $\PageIndex{1}$

For any random variables $R_1$ and $R_2$,

\[\nonumber \text{Ex}[R_1 + R_2] = \text{Ex}[R_1] + \text{Ex}[R_2].\]

Proof

Let $T ::= R_1 + R_2$. The proof follows straightforwardly by rearranging terms in equation (18.4.1) in the definition of expectation:

\[\begin{aligned} \text{Ex}[T] ::= \sum_{\omega \in S} T(\omega) \cdot \text{Pr}[\omega] \\ &= \sum_{\omega \in S} (R_1(\omega) + R_2(\omega)) \cdot \text{Pr}[\omega] & (\text{def of } T) \\ &= \sum_{\omega \in S} R_1(\omega) \cdot \text{Pr}[\omega] + \sum_{\omega \in S} R_2(\omega) \cdot \text{Pr}[\omega] & (\text{rearranging terms}) \\ &= \text{Ex}[R_1] + \text{Ex}[R_2]. & (\text{by }(18.4.1)) & & \quad \blacksquare \end{aligned}\]

A small extension of this proof, which we leave to the reader, implies

Theorem $\PageIndex{2}$

For any random variables $R_1, R_2$ and constants $a_1, a_2 \in \mathbb{R}$,

\[\nonumber \text{Ex}[a_1 R_1 + a_2 R_2] = a_1 \text{Ex}[R_1] + a_2 \text{Ex}[R_2].\]

In other words, expectation is a linear function. A routine induction extends the result to more than two variables:

Corollary 18.5.3 (Linearity of Expectation). For any random variables $R_1, \ldots, R_k$ and constants $a_1, \ldots, a_k \in \mathbb{R}$,

\[\nonumber \text{Ex} \left[ \sum_{i = 1}^{k} a_i R_i\right] = \sum_{i = 1}^{k} a_i \text{Ex}[R_i].\]

The great thing about linearity of expectation is that no independence is required. This is really useful, because dealing with independence is a pain, and we often need to work with random variables that are not known to be independent.

As an example, let’s compute the expected value of the sum of two fair dice.

Expected Value of Two Dice

What is the expected value of the sum of two fair dice?

Let the random variable $R_1$ be the number on the first die, and let $R_2$ be the number on the second die. We observed earlier that the expected value of one die is 3.5. We can find the expected value of the sum using linearity of expectation:

\[\nonumber \text{Ex}[R_1 + R_2] = \text{Ex}[R_1] + \text{Ex}[R_2] = 3.5 + 3.5 = 7.\]

Assuming that the dice were independent, we could use a tree diagram to prove that this expected sum is 7, but this would be a bother since there are 36 cases. And without assuming independence, it’s not apparent how to apply the tree diagram approach at all. But notice that we did not have to assume that the two dice were independent. The expected sum of two dice is 7—even if they are controlled to act together in some way—as long as each individual controlled die remains fair.

Sums of Indicator Random Variables

Linearity of expectation is especially useful when you have a sum of indicator random variables. As an example, suppose there is a dinner party where $n$ men check their hats. The hats are mixed up during dinner, so that afterward each man receives a random hat. In particular, each man gets his own hat with probability $1/n$. What is the expected number of men who get their own hat?

Letting $G$ be the number of men that get their own hat, we want to find the expectation of $G$. But all we know about $G$ is that the probability that a man gets his own hat back is $1/n$. There are many different probability distributions of hat permutations with this property, so we don’t know enough about the distribution of $G$ to calculate its expectation directly using equation (18.4.1) or (18.4.2). But linearity of expectation lets us sidestep this issue.

We’ll use a standard, useful trick to apply linearity, namely, we’ll express $G$ as a sum of indicator variables. In particular, let $G_i$ be an indicator for the event that the $i$th man gets his own hat. That is, $G_i = 1$ if the $i$th man gets his own hat, and $G_i = 0$ otherwise. The number of men that get their own hat is then the sum of these indicator random variables:

\[\label{18.5.1} G = G_1 + G_2 + \cdots + G_n.\]

These indicator variables are not mutually independent. For example, if $n - 1$ men all get their own hats, then the last man is certain to receive his own hat. But again, we don’t need to worry about this dependence, since linearity holds regardless.

Since $G_i$ is an indicator random variable, we know from Lemma 18.4.2 that

\[\label{18.5.2} \text{Ex}[G_i] = \text{Pr}[G_i = 1] = 1/n.\]

By Linearity of Expectation and equation (\ref{18.5.1}), this means that

\[\begin{aligned} \text{Ex}[G] &= \text{Ex}[G_1 + G_2 + \cdots + G_n] \\ &= \text{Ex}[G_1] + \text{Ex}[G_2] + \cdots + \text{Ex}[G_n] \\ &= \overbrace{\frac{1}{n} + \frac{1}{n} + \cdots + \frac{1}{n}}^{n} \\ &= 1. \end{aligned}\]

So even though we don’t know much about how hats are scrambled, we’ve figured out that on average, just one man gets his own hat back, regardless of the number of men with hats!

More generally, Linearity of Expectation provides a very good method for computing the expected number of events that will happen.

Theorem $\PageIndex{4}$

Given any collection of events $A_1, A_2, \ldots, A_n$, the expected number of events that will occur is

\[\nonumber \sum_{i = 1}^{n} \text{Pr}[A_i].\]

For example, $A_i$ could be the event that the $i$th man gets the right hat back. But in general, it could be any subset of the sample space, and we are asking for the expected number of events that will contain a random sample point.

Proof

Define $R_i$ to be the indicator random variable for $A_i$, where $R_i(\omega) = 1$ if $w \in A_i$ and $R_i(\omega) = 0$ if $w \notin A_i$. Let $R = R_1 + R_2 + \cdots + R_n$. Then

\[\begin{aligned} \text{Ex}[R] &= \sum_{i = 1}^{n} \text{Ex}[R_i] & (\text{by Linearity of Expectation}) \\ &= \sum_{i = 1}^{n} \text{Pr}[R_i = 1] & (\text{by Lemma 18.4.2}) \\ &= \sum_{i = 1}^{n} \text{Pr}[A_i]. & (\text{def of indicator variable}) \end{aligned}\]

So whenever you are asked for the expected number of events that occur, all you have to do is sum the probabilities that each event occurs. Independence is not needed.

Expectation of a Binomial Distribution

Suppose that we independently flip $n$ biased coins, each with probability $p$ of coming up heads. What is the expected number of heads?

Let $J$ be the random variable denoting the number of heads. Then $J$ has a binomial distribution with parameters $n, p$, and

\[\nonumber \text{Pr}[J = k] = {n \choose k} p^k (1-p)^{n-k}.\]

Applying equation (18.4.2), this means that

\[\label{18.5.3} \text{Ex}[J] = \sum_{k = 0}^{n} k \text{Pr}[J = k] = \sum_{k = 0}^{n} k {n \choose k} p^k (1-p)^{n-k}.\]

This sum looks a tad nasty, but linearity of expectation leads to an easy derivation of a simple closed form. We just express $J$ as a sum of indicator random variables, which is easy. Namely, let $J_i$ be the indicator random variable for the $i$th coin coming up heads, that is,

\[\nonumber J_i::=\left\{\begin{array}{ll} 1 & \text { if the } i\text{th coin is heads} \\ 0 & \text { if the } i\text{th coin is tails.} \end{array}\right.\]

Then the number of heads is simply

\[\nonumber J = J_1 + J_2 + \cdots + J_n.\]

By Theorem 18.5.4,

\[\label{18.5.4} \text{Ex}[J] = \sum_{i = 1}^{n} \text{Pr}[J_i] = pn.\]

That really was easy. If we flip $n$ mutually independent coins, we expect to get $pn$ heads. Hence the expected value of a binomial distribution with parameters $n$ and $p$ is simply $pn$.

But what if the coins are not mutually independent? It doesn’t matter—the answer is still $pn$ because Linearity of Expectation and Theorem 18.5.4 do not assume any independence.

If you are not yet convinced that Linearity of Expectation and Theorem 18.5.4 are powerful tools, consider this: without even trying, we have used them to prove a complicated looking identity, namely,

\[\label{18.5.5} \sum_{k = 0}^{n} k {n \choose k} p^k (1 - p)^{n-k} = pn,\]

which follows by combining equations (\ref{18.5.3}) and (\ref{18.5.4}) (see also Exercise 18.26).

The next section has an even more convincing illustration of the power of linearity to solve a challenging problem.

The Coupon Collector Problem

Every time we purchase a kid’s meal at Taco Bell, we are graciously presented with a miniature “Racin’ Rocket” car together with a launching device which enables us to project our new vehicle across any tabletop or smooth floor at high velocity. Truly, our delight knows no bounds.

There are different colored Racin’ Rocket cars. The color of car awarded to us by the kind server at the Taco Bell register appears to be selected uniformly and independently at random. What is the expected number of kid’s meals that we must purchase in order to acquire at least one of each color of Racin’ Rocket car?

The same mathematical question shows up in many guises: for example, what is the expected number of people you must poll in order to find at least one person with each possible birthday? The general question is commonly called the coupon collector problem after yet another interpretation.

A clever application of linearity of expectation leads to a simple solution to the coupon collector problem. Suppose there are five different colors of Racin’ Rocket cars, and we receive this sequence:

\[\nonumber \text{blue} \quad \text{green} \quad \text{green} \quad \text{red} \quad \text{blue} \quad \text{orange} \quad \text{blue} \quad \text{orange} \quad \text{gray}.\]

Let’s partition the sequence into 5 segments:

\[\nonumber \underbrace{\text{blue}}_{X_0} \quad \underbrace{\text{green}}_{X_1} \quad \underbrace{\text{green} \quad \text{red}}_{X_2} \quad \underbrace{\text{blue} \quad \text{orange}}_{X_3} \quad \underbrace{\text{blue} \quad \text{orange} \quad \text{gray}}_{X_4}.\]

The rule is that a segment ends whenever we get a new kind of car. For example, the middle segment ends when we get a red car for the first time. In this way, we can break the problem of collecting every type of car into stages. Then we can analyze each stage individually and assemble the results using linearity of expectation.

In the general case there are $n$ colors of Racin’ Rockets that we’re collecting. Let $X_k$ be the length of the $k$th segment. The total number of kid’s meals we must purchase to get all $n$ Racin’ Rockets is the sum of the lengths of all these segments:

\[\nonumber T = X_0 + X_1 + \cdots + X_{n-1}.\]

Now let’s focus our attention on $X_k$, the length of the $k$th segment. At the beginning of segment $k$, we have $k$ different types of car, and the segment ends when we acquire a new type. When we own $k$ types, each kid’s meal contains a type that we already have with probability $k/n$. Therefore, each meal contains a new type of car with probability $1 - k/n = (n - k)/n$. Thus, the expected number of meals until we get a new kind of car is $n / (n - k)$ by the Mean Time to Failure rule. This means that

\[\nonumber \text{Ex}[X_k] = \frac{n}{n - k}.\]

Linearity of expectation, together with this observation, solves the coupon collector problem:

\[\begin{align} \nonumber \text{Ex}[T] &= \text{Ex}[X_0 + X_1 + \cdots + X_{n-1}] \\ \nonumber &= \text{Ex}[X_0 + \text{Ex}[X_1] + \cdots + \text{Ex}[X_{n-1}] \\ \nonumber &= \frac{n}{n - 0} + \frac{n}{n - 1} + \cdots + \frac{n}{3} + \frac{n}{2} + \frac{n}{1} \\ \nonumber &= n \left( \frac{1}{1} + \frac{1}{2} + \frac{1}{3} + \cdots + \frac{1}{n-1} + \frac{1}{n} \right) \\ \label{18.5.6} &= n H_n \\ \nonumber &\sim n \ln n. \end{align}\]

Cool! It’s those Harmonic Numbers again.

We can use equation (\ref{18.5.6}) to answer some concrete questions. For example, the expected number of die rolls required to see every number from 1 to 6 is:

\[\nonumber 6H_6 = 14.7 \ldots\]

And the expected number of people you must poll to find at least one person with each possible birthday is:

\[\nonumber 365H_{365} = 2364.6 \ldots\]

Infinite Sums

Linearity of expectation also works for an infinite number of random variables provided that the variables satisfy an absolute convergence criterion.

Theorem $\PageIndex{5}$

(Linearity of Expectation). Let $R_0, R_1, \ldots$, be random variables such that

\[\nonumber \sum_{i = 0}^{\infty} \text{Ex}[|R_i|]\]

converges. Then

\[\nonumber \text{Ex}\left[ \sum_{i = 0}^{\infty} R_i \right] = \sum_{i = 0}^{\infty} \text{Ex}[R_i].\]

Proof

Let $T ::= \sum_{i = 0}^{\infty} R_i.$

We leave it to the reader to verify that, under the given convergence hypothesis, all the sums in the following derivation are absolutely convergent, which justifies rearranging them as follows:

\[\begin{aligned} \nonumber \sum_{i = 0}^{\infty} \text{Ex}[R_i] &= \sum_{i = 0}^{\infty} \sum_{s \in S} R_i(s) \cdot \text{Pr}[s] & (\text{Def. 18.4.1}) \\ &= \sum_{s \in S} \sum_{i = 0}^{\infty} R_i(s) \cdot \text{Pr}[s] & (\text{exchanging order of summation}) \\ &= \sum_{s \in S} \left[ \sum_{i = 0}^{\infty} R_i(s) \right] \cdot \text{Pr}[s] & (\text{factoring out }\text{Pr}[s]) \\ &= \sum_{s \in S} T(s) \cdot \text{Pr}[s] & (\text{Def. of T}) \\ &= \text{Ex}[T] & (\text{Def. 18.4.1}) \\ &= \text{Ex} \left[ \sum_{i = 0}^{\infty} R_i \right]. & (\text{Def. of T}) \\ & & \quad \blacksquare \end{aligned}\]

Gambling Paradox

One of the simplest casino bets is on “red” or “black” at the roulette table. In each play at roulette, a small ball is set spinning around a roulette wheel until it lands in a red, black, or green colored slot. The payoff for a bet on red or black matches the bet; for example, if you bet $10 on red and the ball lands in a red slot, you get back your original $10 bet plus another matching $10.

The casino gets its advantage from the green slots, which make the probability of both red and black each less than $1/2$. In the US, a roulette wheel has 2 green slots among 18 black and 18 red slots, so the probability of red is $18/38 \approx 0.473$. In Europe, where roulette wheels have only 1 green slot, the odds for red are a little better—that is, $18/37 \approx 0.486$—but still less than even.

Of course you can’t expect to win playing roulette, even if you had the good fortune to gamble against a fair roulette wheel. To prove this, note that with a fair wheel, you are equally likely win or lose each bet, so your expected win on any spin is zero. Therefore if you keep betting, your expected win is the sum of your expected wins on each bet: still zero.

Even so, gamblers regularly try to develop betting strategies to win at roulette despite the bad odds. A well known strategy of this kind is bet doubling, where you bet, say, $10 on red and keep doubling the bet until a red comes up. This means you stop playing if red comes up on the first spin, and you leave the casino with a $10 profit. If red does not come up, you bet $20 on the second spin. Now if the second spin comes up red, you get your $20 bet plus $20 back and again walk away with a net profit of $20 10 D $10. If red does not come up on the second spin, you next bet $40 and walk away with a net win of $40 20 10 D $10 if red comes up on on the third spin, and so on.

Since we’ve reasoned that you can’t even win against a fair wheel, this strategy against an unfair wheel shouldn’t work. But wait a minute! There is a 0.486 probability of red appearing on each spin of the wheel, so the mean time until a red occurs is less than three. What’s more, red will come up eventually with probability one, and as soon as it does, you leave the casino $10 ahead. In other words, by bet doubling you are certain to win $10, and so your expectation is $10, not zero!

Something’s wrong here.

Solution to the Paradox

The argument claiming the expectation is zero against a fair wheel is flawed by an implicit, invalid use of linearity of expectation for an infinite sum.

To explain this carefully, let $B_n$ be the number of dollars you win on your $n$th bet, where $B_n$ is defined to be zero if red comes up before the $n$th spin of the wheel. Now the dollar amount you win in any gambling session is

\[\nonumber \sum_{n = 1}^{\infty} B_n,\]

and your expected win is

\[\label{18.5.7} \text{Ex} \left[ \sum_{n = 1}^{\infty} B_n \right].\]

Moreover, since we’re assuming the wheel is fair, it’s true that $\text{Ex}[B_n] = 0$, so

\[\label{18.5.8} \sum_{n = 1}^{\infty} \text{Ex} [B_n] = \sum_{n = 1}^{\infty} 0 = 0.\]

The flaw in the argument that you can’t win is the implicit appeal to linearity of expectation to conclude that the expectation (\label{18.5.7}) equals the sum of expectations in (\label{18.5.8}). This is a case where linearity of expectation fails to hold—even though the expectation (\label{18.5.7}) is 10 and the sum (\label{18.5.8}) of expectations converges. The problem is that the expectation of the sum of the absolute values of the bets diverges, so the condition required for infinite linearity fails. In particular, under bet doubling your $n$th bet is $10 \cdot 2^{n-1}$ dollars while the probability that you will make an nth bet is $2^{-n}$. So

\[\nonumber \text{Ex}[|B_n|] = 10 \cdot 2^{n-1}2^{-n} = 20.\]

Therefore the sum

\[\nonumber \sum_{n = 1}^{\infty} \text{Ex}[|B_n|] = 20 + 20 + 20 + \cdots \]

diverges rapidly.

So the presumption that you can’t beat a fair game, and the argument we offered to support this presumption, are mistaken: by bet doubling, you can be sure to walk away a winner. Probability theory has led to an apparently absurd conclusion. But probability theory shouldn’t be rejected because it leads to this absurd conclusion.

If you only had a finite amount of money to bet with—say enough money to make $k$ bets before going bankrupt—then it would be correct to calculate your expection by summing $B_1 + B_2 + \cdots + B_k$, and your expectation would be zero for the fair wheel and negative against an unfair wheel. In other words, in order to follow the bet doubling strategy, you need to have an infinite bankroll. So it’s absurd to assume you could actually follow a bet doubling strategy, and it’s entirely reasonable that an absurd assumption leads to an absurd conclusion.

Expectations of Products

While the expectation of a sum is the sum of the expectations, the same is usually not true for products. For example, suppose that we roll a fair 6-sided die and denote the outcome with the random variable $R$. Does $\text{Ex}[R \cdot R] = \text{Ex}[R] \cdot \text{Ex}[R]$?

We know that $\text{Ex}[R] = 3\frac{1}{2}$ and thus $\text{Ex}[R]^2 = 12\frac{1}{4}$. Let’s compute $\text{Ex}[R]^2$ to see if we get the same result.

\[\begin{aligned} \text{Ex}[R]^2 &= \sum_{\omega \in S}R^2 (\omega)\text{Pr}[w] = \sum_{i = 1}^{6} i ^2 \cdot \text{Pr}[R_i = i] \\ &= \frac{1^2}{6} + \frac{2^2}{6} + \frac{3^2}{6} + \frac{4^2}{6} + \frac{5^2}{6} + \frac{6^2}{6} = 15 \text{ } 1/6 \neq 12 \text{ } 1/4. \end{aligned}\]

That is,

\[\nonumber \text{Ex}[R \cdot R] \neq \text{Ex}[R] \cdot \text{Ex}[R].\]

So the expectation of a product is not always equal to the product of the expectations.

There is a special case when such a relationship does hold however; namely, when the random variables in the product are independent.

Theorem $\PageIndex{6}$

For any two independent random variables $R_1, R_2,$

\[\nonumber \text{Ex}[R_1 \cdot R_2] = \text{Ex}[R_1] \cdot \text{Ex}[R_2].\]

The proof follows by rearrangement of terms in the sum that defines $\text{Ex}[R_1 \cdot R_2]$. Details appear in Problem 18.25.

Theorem 18.5.6 extends routinely to a collection of mutually independent variables.

Corollary 18.5.7. [Expectation of Independent Product]

If random variables $R_1, R_2, \ldots, R_k$ are mutually independent, then

\[\nonumber \text{Ex}\left[ \prod_{i = 1}^{k} R_i\right] = \prod_{i = 1}^{k} \text{Ex}[R_i].\]