19.1: Markov’s Theorem

Last updated
Save as PDF

Page ID: 48437

Eric Lehman, F. Thomson Leighton, & Alberty R. Meyer
Google and Massachusetts Institute of Technology via MIT OpenCourseWare

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

Markov’s theorem gives a generally coarse estimate of the probability that a random variable takes a value much larger than its mean. It is an almost trivial result by itself, but it actually leads fairly directly to much stronger results.

The idea behind Markov’s Theorem can be explained by considering the quantity known as intelligence quotient, IQ, which remains in wide use despite doubts about its legitimacy. IQ was devised so that its average measurement would be 100. This immediately implies that at most \(1/3\) of the population can have an IQ of 300 or more, because if more than a third had an IQ of 300, then the average would have to be more than \(1/3 \cdot 300 = 100\). So, the probability that a randomly chosen person has an IQ of 300 or more is at most 1/3. By the same logic, we can also conclude that at most 2/3 of the population can have an IQ of 150 or more.

Of course, these are not very strong conclusions. No IQ of over 300 has ever been recorded; and while many IQ’s of over 150 have been recorded, the fraction of the population that actually has an IQ that high is very much smaller than \(2/3\). But though these conclusions are weak, we reached them using just the fact that the average IQ is 100—along with another fact we took for granted, that IQ is never negative. Using only these facts, we can’t derive smaller fractions, because there are nonnegative random variables with mean 100 that achieve these fractions. For example, if we choose a random variable equal to 300 with probability \(1/3\) and 0 with probability \(2/3\), then its mean is 100, and the probability of a value of 300 or more really is \(1/3\).

Theorem \(\PageIndex{1}\)

(Markov’s Theorem). If \(R\) is a nonnegative random variable, then for all \(x>0\)

\[\label{19.1.1} \text{Pr}[R \geq x] \leq \frac{\text{Ex}[R]}{x}.\]

Proof

Let \(y\) vary over the range of \(R\). Then for any \(x > 0\)

\[\begin{align} \nonumber \text{Ex}[R] &::= \sum_{y} y \text{Pr}[R = y] \\ \nonumber &\geq \sum_{y \geq x} y \text{Pr}[R = y] \geq \sum_{y \geq x} x \text{Pr}[R = y] = x \sum_{y \geq x} \text{Pr}[R = y] \\ \label{19.1.2} &= x \text{Pr}[R \geq x], \end{align}\]

where the first inequality follows from the fact that \(R \geq 0\).

Dividing the first and last expressions in (\ref{19.1.2}) by x gives the desired result. \(\quad \blacksquare\)

Our focus is deviation from the mean, so it’s useful to rephrase Markov’s Theorem this way:

Corollary 19.1.2. If \(R\) is a nonnegative random variable, then for all \(c \geq 1\)

\[\label{19.1.3} \text{Pr}[R \geq c \cdot \text{Ex}[R]] \leq \frac{1}{c}.\]

This Corollary follows immediately from Markov’s Theorem(19.1.1) by letting \(x\) be \(c \cdot \text{Ex}[R]\).

Applying Markov’s Theorem

Let’s go back to the Hat-Check problem of Section 18.5.2. Now we ask what the probability is that \(x\) or more men get the right hat, this is, what the value of \(\text{Pr}[G \geq x]\) is.

We can compute an upper bound with Markov’s Theorem. Since we know \(\text{Ex}[G] = 1\), Markov’s Theorem implies

\[\nonumber \text{Pr}[G \geq x] \leq \frac{\text{Ex}[G]}{x} = \frac{1}{x}.\]

For example, there is no better than a 20% chance that 5 men get the right hat, regardless of the number of people at the dinner party.

The Chinese Appetizer problem is similar to the Hat-Check problem. In this case, \(n\) people are eating different appetizers arranged on a circular, rotating Chinese banquet tray. Someone then spins the tray so that each person receives a random appetizer. What is the probability that everyone gets the same appetizer as before?

There are \(n\) equally likely orientations for the tray after it stops spinning. Everyone gets the right appetizer in just one of these \(n\) orientations. Therefore, the correct answer is \(1/n\).

But what probability do we get from Markov’s Theorem? Let the random variable, \(R\), be the number of people that get the right appetizer. Then of course \(\text{Ex}[R] = 1\), so applying Markov’s Theorem, we find:

\[\nonumber \text{Pr}[R \geq n] \leq \frac{\text{Ex}[R]}{n} = \frac{1}{n}.\]

So for the Chinese appetizer problem, Markov’s Theorem is precisely right!

Unfortunately, Markov’s Theorem is not always so accurate. For example, it gives the same \(1/n\) upper limit for the probability that everyone gets their own hat back in the Hat-Check problem, where the probability is actually \(1/(n!)\). So for Hat-Check, Markov’s Theorem gives a probability bound that is way too large.

Markov’s Theorem for Bounded Variables

Suppose we learn that the average IQ among MIT students is 150 (which is not true, by the way). What can we say about the probability that an MIT student has an IQ of more than 200? Markov’s theorem immediately tells us that no more than \(150/200\) or \(3/4\) of the students can have such a high IQ. Here, we simply applied Markov’s Theorem to the random variable, \(R\), equal to the IQ of a random MIT student to conclude:

\[\nonumber \text{Pr}[R > 200] \leq \frac{\text{Ex}[R]}{200} = \frac{150}{200} = \frac{3}{4}.\]

But let’s observe an additional fact (which may be true): no MIT student has an IQ less than 100. This means that if we let \(T ::= R - 100\), then \(T\) is nonnegative and \(\text{Ex}[T] = 50\), so we can apply Markov’s Theorem to T and conclude:

\[\nonumber \text{Pr}[R > 200] = \text{Pr}[T > 100] \leq \frac{\text{Ex}[T]}{100} = \frac{50}{100} = \frac{1}{2}.\]

So only half, not \(3/4\), of the students can be as amazing as they think they are. A bit of a relief!

In fact, we can get better bounds applying Markov’s Theorem to \(R - b\) instead of \(R\) for any lower bound \(b\) on \(R\) (see Problem 19.3). Similarly, if we have any upper bound, \(u\), on a random variable, \(S\), then \(u - S\) will be a nonnegative random variable, and applying Markov’s Theorem to \(u - S\) will allow us to bound the probability that \(S\) is much less than its expectation.