# 19.4: Estimation by Random Sampling

- Page ID
- 48440

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Democratic politicians were astonished in 2010 when their early polls of sample voters showed Republican Scott Brown was favored by a majority of voters and so would win the special election to fill the Senate seat that the late Democrat Teddy Kennedy had occupied for over 40 years. Based on their poll results, they mounted an intense, but ultimately unsuccessful, effort to save the seat for their party.

## Voter Poll

Suppose at some time before the election that \(p\) was the fraction of voters favoring Scott Brown. We want to estimate this unknown fraction \(p\). Suppose we have some random process for selecting voters from registration lists that selects each voter with equal probability. We can define an indicator variable, \(K\), by the rule that \(K = 1\) if the random voter most prefers Brown, and \(K = 0\) otherwise.

Now to estimate \(p\), we take a large number, \(n\), of random choices of voters^{3} and count the fraction who favor Brown. That is, we define variables \(K_1, K_2, \ldots\), where \(K_i\) is interpreted to be the indicator variable for the event that the \(i\)th chosen voter prefers Brown. Since our choices are made independently, the \(K_i\)’s are independent. So formally, we model our estimation process by assuming we have mutually independent indicator variables \(K_1, K_2, \ldots\), each with the same probability, \(p\), of being equal to 1. Now let \(S_n\) be their sum, that is,

\[\label{19.4.1} S_n ::= \sum_{i=1}^{n} K_i.\]

The variable \(S_n/n\) describes the fraction of sampled voters who favor Scott Brown. Most people intuitively, and correctly, expect this sample fraction to give a useful approximation to the unknown fraction, \(p\).

So we will use the sample value, \(S_n/n\), as our *statistical estimate* of \(p\). We know that \(S_n\) has a binomial distribution with parameters \(n\) and \(p\); we can choose \(n\), but \(p\) is unknown.

**How Large a Sample?**

Suppose we want our estimate to be within 0.04 of the fraction, \(p\), at least 95% of the time. This means we want

\[\label{19.4.2} \text{Pr}\left[\left| \frac{S_n}{n} - p \right| \leq 0.04 \right] \geq 0.95.\]

So we’d better determine the number, \(n\) of times we must poll voters so that inequality (\ref{19.4.2}) will hold. Chebyshev's Theorem offers a simple way to determine such a \(n\).

Sn is binomially distributed. Equation (19.3.10), combined with the fact that \(p(1 - p)\) is maximized when \(p = 1 - p\), that is, when \(p = 1/2\) (check for yourself!), gives

\[\label{19.4.3} \text{Var}[S_n] = n(p(1-p)) \leq n \cdot \frac{1}{4} = \frac{n}{4}.\]

Next, we bound the variance of \(S_n/n\):

\[\begin{align} \nonumber \text{Var}\left[ \frac{S_n}{n}\right] &= \left( \frac{1}{n} ^2 \right) \text{Var}[S_n] & (\text{(Square Multiple Rule for Variance (19.3.4)}) \\ \nonumber &\leq \frac{1}{n} ^2 \frac{n}{4} & (\text{by (19.4.3)}) \\ &= \label{19.4.4} \frac{1}{4n} \end{align}\]

Using Chebyshev’s bound and (\ref{19.4.4}) we have:

\[\label{19.4.5} \text{Pr}\left[\left| \frac{S_n}{n} - p \right| \geq 0.04 \right] \leq \frac{\text{Var}[S_n / n]}{(0.04)^2} \leq \frac{1}{4n(0.04)^2} = \frac{156.25}{n}\]

To make our our estimate with 95% confidence, we want the righthand side of (\ref{19.4.5}) to be at most \(1/20\). So we choose \(n\) so that

\[\nonumber \frac{156.25}{n} \leq \frac{1}{20},\]

that is,

\[\nonumber n \geq 3,125.\]

Section 19.6.2 describes how to get tighter estimates of the tails of binomial distributions that lead to a bound on \(n\) that is about four times smaller than the one above. But working through this example using only the variance illustrates an approach to estimation that is applicable to arbitrary random variables, not just binomial variables.

## Matching Birthdays

There are important cases where the relevant distributions are not binomial because the mutual independence properties of the voter preference example do not hold. In these cases, estimation methods based on Chebyshev’s Theorem may be the best approach. Birthday Matching is an example. We already saw in Section 16.4 that in a class of 95 students, it is virtually certain that at least one pair of students will have the same birthday, which suggests that several pairs of students are likely to have the same birthday. How many matched birthdays should we expect?

As before, suppose there are \(n\) students and \(d\) days in the year, and let \(M\) be the number of pairs of students with matching birthdays. Now it will be easy to calculate the expected number of pairs of students with matching birthdays. Then we can take the same approach as we did in estimating voter preferences to get an estimate of the probability of getting a number of pairs close to the expected number.

Unlike the situation with voter preferences, having matching birthdays for different pairs of students are not mutually independent events. Knowing Alice’s birthday matches Bob’s tells us nothing about who Carol matches, and knowing Alice has the same birthday as Carol tells us nothing about who Bob matches. But if Alice matches Bob and Alice matches Carol, it’s certain that Bob and Carol match as well! The events that various pairs of students have matching birthdays are not mutually independent, and indeed not even three-way independent. The best we can say is that they are pairwise independent. This will allow us to apply the same reasoning to Birthday Matching as we did for voter preference. Namely, let \(B_1, B_2, \ldots, B_n\) be the birthdays of \(n\) independently chosen people, and let \(E_{i,j}\) be the indicator variable for the event that the \(i\)th and \(j\)th people chosen have the same birthdays, that is, the event \([B_i = B_j]\). So in our probability model, the \(B_i\)’s are mutually independent variables, and the \(E_{i,j}\) ’s are pairwise independent. Also, the expectations of \(E_{i,j}\) for \(i \neq j\) equals the probability that \(B_i = B_j\), namely, \(1/d\).

Now, \(M\), the number of matching pairs of birthdays among the \(n\) choices, is simply the sum of the \(E_{i,j}\) ’s:

\[\label{19.4.6} M ::= \sum_{1 \leq i < j \leq n} E_{i,j}.\]

So by linearity of expectation

\[\nonumber \text{Ex}[M] = \text{Ex}\left[ \sum_{1 \leq i < j \leq n} E_{i,j} \right] = \sum_{1 \leq i < j \leq n}\text{Ex}E_{i,j} = {n \choose 2} \cdot \frac{1}{d}.\]

Similarly,

\[\begin{aligned} \text{Var}[M] &= \text{Var} \left[\sum_{1 \leq i < j \leq n} E_{i,j} \right] \\ &= \sum_{1 \leq i < j \leq n}\text{Var}E_{i,j} & (\text{(Theorem 19.3.8}) \\ &= {n \choose 2} \cdot \frac{1}{d} \left( 1 - \frac{1}{d} \right). & (\text{(Corollary 19.3.2}) \end{aligned}\]

In particular, for a class of \(n = 95\) students with \(d = 365\) possible birthdays, we have \(\text{Ex}[M] \approx 12.23\) and \(\text{Var}[M] \approx 12.23(1 - 1/365) < 12.2\). So by Chebyshev’s Theorem

\[\nonumber \text{Pr}[|M - \text{Ex}[M]| \geq x] < \frac{12.2}{x^2}.\]

Letting \(x = 7\), we conclude that there is a better than 75% chance that in a class of 95 students, the number of pairs of students with the same birthday will be within 7 of 12.23, that is, between 6 and 19.

## Pairwise Independent Sampling

The reasoning we used above to analyze voter polling and matching birthdays is very similar. We summarize it in slightly more general form with a basic result called the Pairwise Independent Sampling Theorem. In particular, we do not need to restrict ourselves to sums of zero-one valued variables, or to variables with the same distribution. For simplicity, we state the Theorem for pairwise independent variables with possibly different distributions but with the same mean and variance.

Theorem \(\PageIndex{1}\)

(Pairwise Independent Sampling). *Let \(G_1, \ldots, G_n\) be pairwise independent variables with the same mean, \(\mu\), and deviation, \(\sigma\). Define*

\[\label{19.4.7} S_n ::= \sum_{i = 1}^{n} G_i.\]

*Then*

\[\nonumber \text{Pr}\left[\left| \frac{S_n}{n} - \mu \right| \geq x \right] \leq \frac{1}{n} \left( \frac{\sigma}{x}^2 \right).\]

**Proof**-
We observe first that the expectation of \(S_n / n\) is \(\mu\):

\[\begin{aligned} \text{Ex}\left[\frac{S_n}{n} \right] &= \text{Ex}\left[ \frac{\sum_{i = 1}^{n} G_i}{n} \right] & (\text{def of }S_n) \\ &= \frac{\sum_{i = 1}^{n} \text{Ex}[G_i]}{n} & (\text{linearity of expectation}) \\ &= \frac{\sum_{i = 1}^{n}\mu}{n} \\ &= \frac{n \mu}{n} = \mu. \end{aligned}\]

The second important property of \(S_n / n\) is that its variance is the variance of \(G_i\) divided by \(n\):

\[\begin{align} \nonumber \text{Var}\left[\frac{S_n}{n} \right] &= \left(\frac{1}{n}\right)^2 \text{Var}[S_n] & (\text{Square Multiple Rule for Variance (19.3.4)}) \\ \nonumber &= \frac{1}{n^2} \text{Var}\left[\sum_{i = 1}^{n}G_i \right] & (\text{def of } S_n) \\ \nonumber &= \frac{1}{n^2} \sum_{i = 1}^{n} \text{Var}[G_i] & (\text{pairwise independent additivity}) \\ \label{19.4.8} &= \frac{1}{n^2} \cdot n \sigma ^2 = \frac{\sigma^2}{n}. \end{align}\]

This is enough to apply Chebyshev’s Theorem and conclude:

\[\begin{aligned} \text{Pr}\left[\left| \frac{S_n}{n} - \mu \right| \geq x \right] &\leq \frac{\text{Var}[S_n / n]}{x^2}. & (\text{Chebyshev’s bound}) \\ &= \frac{\sigma ^2 / n}{x^2} & (\text{by (19.4.8)}) \\ &= \frac{1}{n} \left( \frac{\sigma}{x}\right)^2. \\ & & \quad \blacksquare \end{aligned}\]

The Pairwise Independent Sampling Theorem provides a quantitative general statement about how the average of independent samples of a random variable approaches the mean. In particular, it proves what is known as the Law of Large Numbers^{4}: by choosing a large enough sample size, we can get arbitrarily accurate estimates of the mean with confidence arbitrarily close to 100%.

**Corollary 19.4.2. ***[Weak Law of Large Numbers] Let \(G_1, \ldots, G_n\) be pairwise independent variables with the same mean, \(\mu\), and the same finite deviation, and let*

\[\nonumber S_n ::= \frac{\sum_{i = 1}^{n}G_i}{n}.\]

*Then for every \(\epsilon > 0\),*

\[\nonumber \lim_{n \rightarrow \infty} \text{Pr}[|S_n - \mu| \leq \epsilon] = 1.\]

^{3}We’re choosing a random voter \(n\) times *with replacement*. We don’t remove a chosen voter from the set of voters eligible to be chosen later; so we might choose the same voter more than once! We would get a slightly better estimate if we required \(n\) *different* people to be chosen, but doing so complicates both the selection process and its analysis for little gain.

^{4}This is the *Weak* Law of Large Numbers. As you might suppose, there is also a Strong Law, but it’s outside the scope of 6.042.