# 19.3: Properties of Variance

• Eric Lehman, F. Thomson Leighton, & Alberty R. Meyer
• Google and Massachusetts Institute of Technology via MIT OpenCourseWare

$$\newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} }$$

$$\newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}}$$

$$\newcommand{\id}{\mathrm{id}}$$ $$\newcommand{\Span}{\mathrm{span}}$$

( \newcommand{\kernel}{\mathrm{null}\,}\) $$\newcommand{\range}{\mathrm{range}\,}$$

$$\newcommand{\RealPart}{\mathrm{Re}}$$ $$\newcommand{\ImaginaryPart}{\mathrm{Im}}$$

$$\newcommand{\Argument}{\mathrm{Arg}}$$ $$\newcommand{\norm}[1]{\| #1 \|}$$

$$\newcommand{\inner}[2]{\langle #1, #2 \rangle}$$

$$\newcommand{\Span}{\mathrm{span}}$$

$$\newcommand{\id}{\mathrm{id}}$$

$$\newcommand{\Span}{\mathrm{span}}$$

$$\newcommand{\kernel}{\mathrm{null}\,}$$

$$\newcommand{\range}{\mathrm{range}\,}$$

$$\newcommand{\RealPart}{\mathrm{Re}}$$

$$\newcommand{\ImaginaryPart}{\mathrm{Im}}$$

$$\newcommand{\Argument}{\mathrm{Arg}}$$

$$\newcommand{\norm}[1]{\| #1 \|}$$

$$\newcommand{\inner}[2]{\langle #1, #2 \rangle}$$

$$\newcommand{\Span}{\mathrm{span}}$$ $$\newcommand{\AA}{\unicode[.8,0]{x212B}}$$

$$\newcommand{\vectorA}[1]{\vec{#1}} % arrow$$

$$\newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow$$

$$\newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} }$$

$$\newcommand{\vectorC}[1]{\textbf{#1}}$$

$$\newcommand{\vectorD}[1]{\overrightarrow{#1}}$$

$$\newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}}$$

$$\newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}}$$

$$\newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} }$$

$$\newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}}$$

Variance is the average of the square of the distance from the mean. For this reason, variance is sometimes called the “mean square deviation.” Then we take its square root to get the standard deviation—which in turn is called “root mean square deviation.”

But why bother squaring? Why not study the actual distance from the mean, namely, the absolute value of $$R - \text{Ex}[R]$$, instead of its root mean square? The answer is that variance and standard deviation have useful properties that make them much more important in probability theory than average absolute deviation. In this section, we’ll describe some of those properties. In the next section, we’ll see why these properties are important.

## Formula for Variance

Applying linearity of expectation to the formula for variance yields a convenient alternative formula.

Lemma 19.3.1.

$\nonumber \text{Var}[R] = \text{Ex}[R^2] - \text{Ex}^2[R],$

for any random variable, $$R$$.

Here we use the notation $$\text{Ex}^2 [R]$$ as shorthand for $$(\text{Ex}[R])^2$$.

Proof. Let $$\mu = \text{Ex}[R].$$ Then

\begin{aligned} \text{Var}[R] &= \text{Ex}[(R - \text{Ex}[R])^2] & (\text{Def 19.2.2 of variance}) \\ &= \text{Ex}[(R - \mu)^2] & (\text{def of }\mu) \\ &= \text{Ex}[R^2 - 2\mu R + \mu^2] \\ &= \text{Ex}[R^2] - 2\mu \text{Ex}[R] + \mu^2 & (\text{linearity of expectation}) \\ &= \text{Ex}[R^2] - 2\mu^2 + \mu^2 & (\text{def of } \mu) \\ &= \text{Ex}[R^2] - \mu^2 \\ &= \text{Ex}[R^2] - \text{Ex}^2 [R]. & (\text{def of } \mu) \\ & & \quad \blacksquare \end{aligned}

A simple and very useful formula for the variance of an indicator variable is an immediate consequence.

Corollary 19.3.2. If $$B$$ is a Bernoulli variable where $$p ::= \text{Pr}[B = 1]$$, then

$\label{19.3.1} \text{Var}[B] = p - p^2 = p(1-p).$

Proof. By Lemma 18.4.2, $$\text{Ex}[B] = p$$. But $$B$$ only takes values 0 and 1, so $$B^2 = B$$ and equation (\ref{19.3.1}) follows immediately from Lemma 19.3.1. $$\quad \blacksquare$$

## Variance of Time to Failure

According to Section 18.4.6, the mean time to failure is $$1/p$$ for a process that fails during any given hour with probability $$p$$. What about the variance?

By Lemma 19.3.1,

$\label{19.3.2} \text{Var}[C] = \text{Ex}[C^2] - (1/p)^2$

so all we need is a formula for $$\text{Ex}[C^2]$$.

Reasoning about $$C$$ using conditional expectation worked nicely in Section 18.4.6 to find mean time to failure, and a similar approach works for $$C^2$$. Namely, the expected value of $$C^2$$ is the probability, $$p$$, of failure in the first hour times $$1^2$$, plus the probability, $$(1-p)$$, of non-failure in the first hour times the expected value of $$(C+1)^2$$. So

\begin{aligned} \text{Ex}[C^2] &= p \cdot 1^2 + (1-p) \text{Ex}[(C+1)^2] \\ &= p + (1-p) \left( \text{Ex}[C^2] + \frac{2}{p} + 1 \right) \\ &= p + (1-p)\text{Ex}[C^2] + (1-p)\left(\frac{2}{p} + 1 \right), \quad \text{so} \end{aligned}

\begin{aligned} p\text{Ex}[C^2] &= p + (1-p) \left(\frac{2}{p} + 1 \right) \\ &= \frac{p^2 + (1-p)(2+p)}{p} \quad \text{and} \end{aligned}

$\nonumber \text{Ex}[C^2] = \frac{2-p}{p^2}.$

Combining this with (\ref{19.3.2}) proves

Lemma 19.3.3. If failures occur with probability $$p$$ independently at each step, and $$C$$ is the number of steps until the first failure2 , then

$\text{Var}[C] = \frac{1-p}{p^2}.$

## Dealing with Constants

It helps to know how to calculate the variance of $$aR + b$$:

Theorem $$\PageIndex{4}$$

[Square Multiple Rule for Variance] Let $$R$$ be a random variable and $$a$$ a constant. Then

$\text{Var}[aR] = a^2 \text{Var}[R].$

Proof

Beginning with the definition of variance and repeatedly applying linearity of expectation, we have:

\begin{aligned} \text{Var}[aR] &::= \text{Ex}[(aR - \text{Ex}[aR])^2] \\ &= \text{Ex}[(aR)^2 - 2aR \text{Ex}[aR] + \text{Ex}^2 [aR]] \\ &= \text{Ex}[(aR)^2] - \text{Ex}[2aR \text{Ex}[aR]] + \text{Ex}^2 [aR] \\ &= a^2\text{Ex}[R^2] - 2 \text{Ex}[aR] \text{Ex}[aR] + \text{Ex}^2 [aR] \\ &= a^2\text{Ex}[R] - a^2 \text{Ex}^2 [R] \\ &= a^2 (\text{Ex}[R^2] - \text{Ex}^2 [R]) \\ &= a^2 \text{Var}[R] & (\text{Lemma 19.3.1}) \\ & & \quad \blacksquare \end{aligned}

It’s even simpler to prove that adding a constant does not change the variance, as the reader can verify:

Theorem $$\PageIndex{5}$$

Let $$R$$ be a random variable and $$b$$ a constant. Then

$\text{Var}[R + b] = \text{Var}[R].$

Recalling that the standard deviation is the square root of variance, this implies that the standard deviation of $$aR + b$$ is simply $$|a|$$ times the standard deviation of $$R$$:

Corollary 19.3.6.

$\nonumber \sigma_{(aR + b)} = |a| \sigma_R.$

## Variance of a Sum

In general, the variance of a sum is not equal to the sum of the variances, but variances do add for independent variables. In fact, mutual independence is not necessary: pairwise independence will do. This is useful to know because there are some important situations, such as Birthday Matching in Section 16.4, that involve variables that are pairwise independent but not mutually independent.

Theorem $$\PageIndex{7}$$

If $$R$$ and $$S$$ are independent random variables, then

$\label{19.3.6} \text{Var}[R + S] = \text{Var}[R] + \text{Var}[S].$

Proof

We may assume that $$\text{Ex}[R] = 0$$, since we could always replace $$R$$ by $$R - \text{Ex}[R]$$ in equation (\ref{19.3.6}); likewise for $$S$$. This substitution preserves the independence of the variables, and by Theorem 19.3.5, does not change the variances.

But for any variable $$T$$ with expectation zero, we have $$\text{Var}[T] = \text{Ex}[T^2]$$, so we need only prove

$\label{19.3.7} \text{Ex}[(R + S)^2] = \text{Ex}[R^2] + \text{Ex}[S^2].$

But (\ref{19.3.7}) follows from linearity of expectation and the fact that

$\label{19.3.8} \text{Ex}[RS] = \text{Ex}[R]\text{Ex}[S]$

since $$R$$ and $$S$$ are independent:

\begin{aligned} \text{Ex}[(R + S)^2] &= \text{Ex}[R^2 + 2RS + S^2] \\ &= \text{Ex}[R^2] + 2\text{Ex}[RS] + \text{Ex}[S^2] \\ &= \text{Ex}[R^2] + 2\text{Ex}[R]\text{Ex}[S] + \text{Ex}[S^2] & (\text{by (19.3.8)}) \\ &= \text{Ex}[R^2] + 2 \cdot 0 \cdot 0 + \text{Ex}[S^2] \\ &= \text{Ex}[R^2] + \text{Ex}[S^2] \\ & & \quad \blacksquare \end{aligned}

It’s easy to see that additivity of variance does not generally hold for variables that are not independent. For example, if $$R = S$$, then equation (\ref{19.3.6}) becomes $$\text{Var}[R + R] = \text{Var}[R] + \text{Var}[R]$$. By the Square Multiple Rule, Theorem 19.3.4, this holds iff $$4 \text{Var}[R] = 2 \text{Var}[R]$$, which implies that $$\text{Var}[R] = 0$$. So equation (\ref{19.3.6}) fails when $$R = S$$ and $$R$$ has nonzero variance.

The proof of Theorem 19.3.7 carries over to the sum of any finite number of variables. So we have:

Theorem $$\PageIndex{8}$$

[Pairwise Independent Additivity of Variance] If $$R_1, R_2,\ldots, R_n$$ are pairwise independent random variables, then

$\text{Var}[R_1 + R_2 + \cdots + R_n] = \text{Var}[R_1] + \text{Var}[R_2] + \cdots + \text{Var}[R_n].$

Now we have a simple way of computing the variance of a variable, $$J$$, that has an $$(n, p)$$-binomial distribution. We know that $$J = \sum_{k=1}^{n} I_k$$ where the $$I_k$$ are mutually independent indicator variables with $$\text{Pr}[I_k = 1] = p$$. The variance of each $$I_k$$ is $$p(1-p)$$ by Corollary 19.3.2, so by linearity of variance, we have

Lemma 19.3.9 (Variance of the Binomial Distribution). If $$J$$ has the $$(n, p)$$-binomial distribution, then

$\text{Var}[J] = n \text{Var}[I_k] = np(1-p).$

2That is, $$C$$ has the geometric distribution with parameter $$p$$ according to Definition 18.4.6.

This page titled 19.3: Properties of Variance is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Eric Lehman, F. Thomson Leighton, & Alberty R. Meyer (MIT OpenCourseWare) .