17.8: Mutual Independence

Last updated
Save as PDF

Page ID: 48429

Eric Lehman, F. Thomson Leighton, & Alberty R. Meyer
Google and Massachusetts Institute of Technology via MIT OpenCourseWare

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

We have defined what it means for two events to be independent. What if there are more than two events? For example, how can we say that the flips of \(n\) coins are all independent of one another? A set of events is said to be mutually independent if the probability of each event in the set is the same no matter which of the other events has occurred. This is equivalent to saying that for any selection of two or more of the events, the probability that all the selected events occur equals the product of the probabilities of the selected events.

For example, four events \(E_1, E_2, E_3, E_4\) are mutually independent if and only if all of the following equations hold:

\[\begin{aligned}
\text{Pr}\left[E_{1} \cap E_{2}\right] &=\text{Pr}\left[E_{1}\right] \cdot \text{Pr}\left[E_{2}\right] \\
\text{Pr}\left[E_{1} \cap E_{3}\right] &=\text{Pr}\left[E_{1}\right] \cdot \text{Pr}\left[E_{3}\right] \\
\text{Pr}\left[E_{1} \cap E_{4}\right] &=\text{Pr}\left[E_{1}\right] \cdot \text{Pr}\left[E_{4}\right] \\
\text{Pr}\left[E_{2} \cap E_{3}\right] &=\text{Pr}\left[E_{2}\right] \cdot \text{Pr}\left[E_{3}\right] \\
\text{Pr}\left[E_{2} \cap E_{4}\right] &=\text{Pr}\left[E_{2}\right] \cdot \text{Pr}\left[E_{4}\right] \\
\text{Pr}\left[E_{3} \cap E_{4}\right] &=\text{Pr}\left[E_{3}\right] \cdot \text{Pr}\left[E_{4}\right] \\
\text{Pr}\left[E_{1} \cap E_{2} \cap E_{3}\right] &=\text{Pr}\left[E_{1}\right] \cdot \text{Pr}\left[E_{2}\right] \cdot \text{Pr}\left[E_{3}\right] \\
\text{Pr}\left[E_{1} \cap E_{2} \cap E_{4}\right] &=\text{Pr}\left[E_{1}\right] \cdot \text{Pr}\left[E_{2}\right] \cdot \text{Pr}\left[E_{4}\right] \\
\text{Pr}\left[E_{1} \cap E_{3} \cap E_{4}\right] &=\text{Pr}\left[E_{1}\right] \cdot \text{Pr}\left[E_{3}\right] \cdot \text{Pr}\left[E_{4}\right] \\
\text{Pr}\left[E_{2} \cap E_{3} \cap E_{4}\right] &=\text{Pr}\left[E_{2}\right] \cdot \text{Pr}\left[E_{3}\right] \cdot \text{Pr}\left[E_{4}\right] \\
\text{Pr}\left[E_{1} \cap E_{2} \cap E_{3} \cap E_{4}\right] &=\text{Pr}\left[E_{1}\right] \cdot \text{Pr}\left[E_{2}\right] \cdot \text{Pr}\left[E_{3}\right] \cdot \text{Pr}\left[E_{4}\right]
\end{aligned}\]

The generalization to mutual independence of \(n\) events should now be clear.

DNA Testing

Assumptions about independence are routinely made in practice. Frequently, such assumptions are quite reasonable. Sometimes, however, the reasonableness of an independence assumption is not so clear, and the consequences of a faulty assumption can be severe.

Let’s return to the O. J. Simpson murder trial. The following expert testimony was given on May 15, 1995:

Mr. Clarke: When you make these estimations of frequency—and I believe you touched a little bit on a concept called independence?

Dr. Cotton: Yes, I did.

Mr. Clarke: And what is that again?

Dr. Cotton: It means whether or not you inherit one allele that you have is not— does not affect the second allele that you might get. That is, if you inherit a band at 5,000 base pairs, that doesn’t mean you’ll automatically or with some probability inherit one at 6,000. What you inherit from one parent is what you inherit from the other.

Mr. Clarke: Why is that important?

Dr. Cotton: Mathematically that’s important because if that were not the case, it would be improper to multiply the frequencies between the different genetic locations.

Mr. Clarke: How do you—well, first of all, are these markers independent that you’ve described in your testing in this case?

Presumably, this dialogue was as confusing to you as it was for the jury. Essentially, the jury was told that genetic markers in blood found at the crime scene matched Simpson’s. Furthermore, they were told that the probability that the markers would be found in a randomly-selected person was at most 1 in 170 million. This astronomical figure was derived from statistics such as:

1 person in 100 has marker \(A\).
1 person in 50 marker \(B\).
1 person in 40 has marker \(C\).
1 person in 5 has marker \(D\).
1 person in 170 has marker \(E\).

Then these numbers were multiplied to give the probability that a randomly-selected person would have all five markers:

\[\begin{aligned}
\text{Pr}[A \cap B \cap C \cap D \cap E] &=\text{Pr}[A] \cdot \text{Pr}[B] \cdot \text{Pr}[C] \cdot \text{Pr}[D] \cdot \text{Pr}[E] \\
&=\frac{1}{100} \cdot \frac{1}{50} \cdot \frac{1}{40} \cdot \frac{1}{5} \cdot \frac{1}{170}=\frac{1}{170,000,000}.
\end{aligned}\]

The defense pointed out that this assumes that the markers appear mutually independently. Furthermore, all the statistics were based on just a few hundred blood samples.

After the trial, the jury was widely mocked for failing to “understand” the DNA evidence. If you were a juror, would you accept the 1 in 170 million calculation?

Pairwise Independence

The definition of mutual independence seems awfully complicated—there are so many selections of events to consider! Here’s an example that illustrates the subtlety of independence when more than two events are involved. Suppose that we flip three fair, mutually-independent coins. Define the following events:

\(A_1\) is the event that coin 1 matches coin 2.
\(A_2\) is the event that coin 2 matches coin 3.
\(A_3\) is the event that coin 3 matches coin 1.

Are \(A_1, A_2, A_3\) mutually independent?

The sample space for this experiment is:

\[\nonumber \{HHH,HHT,HTH,HTT,THH,THT,TTH,TTT\}.\]

Every outcome has probability \((1/2)^3 = 1/8\) by our assumption that the coins are mutually independent.

To see if events \(A_1, A_2,\) and \(A_3\) are mutually independent, we must check a sequence of equalities. It will be helpful first to compute the probability of each event \(A_i\):

\[\begin{aligned}
\text{Pr}\left[A_{1}\right] &=\text{Pr}[H H H]+\text{Pr}[H H T]+\text{Pr}[T T H]+\text{Pr}[T T T] \\
&=\frac{1}{8}+\frac{1}{8}+\frac{1}{8}+\frac{1}{8}=\frac{1}{2}
\end{aligned}\]

By symmetry, \(\text{Pr}[A_2] =\text{Pr}[A_3] = 1/2\) as well. Now we can begin checking all the equalities required for mutual independence:

\[\begin{aligned}
\text{Pr}\left[A_{1} \cap A_{2}\right] &=\text{Pr}[H H H]+\text{Pr}[TTT] =\frac{1}{8}+\frac{1}{8} = \frac{1}{4} = \frac{1}{2} \cdot \frac{1}{2} \\ &= \text{Pr}[A_1] \text{Pr}[A_2].
\end{aligned}\]

By symmetry, \(\text{Pr}[A_1 \cap A_3] =\text{Pr}[A_1] \cdot \text{Pr}[A_3]\) and \(\text{Pr}[A_2 \cap A_3] =\text{Pr}[A_2] \cdot \text{Pr}[A_3]\) must hold also. Finally, we must check one last condition:

\[\begin{aligned}
\text{Pr}\left[A_{1} \cap A_{2} \cap A_{3} \right] &=\text{Pr}[H H H]+\text{Pr}[T T T] =\frac{1}{8}+\frac{1}{8} = \frac{1}{4} \\ &\color{red}\neq \color{black} \frac{1}{8} = \text{Pr}[A_1] \text{Pr}[A_2]\text{Pr}[A_3] .
\end{aligned}\]

The three events \(A_1, A_2,\) and \(A_3\) are not mutually independent even though any two of them are independent! This not-quite mutual independence seems weird at first, but it happens. It even generalizes:

Definition \(\PageIndex{1}\)

A set \(A_1, A_2, \ldots\), of events is \(k\)-way independent iff every set of \(k\) of these events is mutually independent. The set is pairwise independent iff it is 2-way independent.

So the events \(A_1, A_2, A_3\) above are pairwise independent, but not mutually independent. Pairwise independence is a much weaker property than mutual independence.

For example, suppose that the prosecutors in the O. J. Simpson trial were wrong and markers \(A, B, C, D,\) and \(E\) appear only pairwise independently. Then the probability that a randomly-selected person has all five markers is no more than:

\[\begin{aligned}
\text{Pr}[A \cap B \cap C \cap D \cap E] &\leq \text{Pr}[A \cap E] = \text{Pr}[A] \text{Pr}[E] \\
&=\frac{1}{100} \cdot \frac{1}{170} =\frac{1}{17,000}.
\end{aligned}\]

The first line uses the fact that \(\text{Pr}[A \cap B \cap C \cap D \cap E]\) is a subset of \(\text{Pr}[A \cap E]\) (We picked out the \(A\) and \(E\) markers because they’re the rarest.) We use pairwise independence on the second line. Now the probability of a random match is 1 in 17,000—a far cry from 1 in 170 million! And this is the strongest conclusion we can reach assuming only pairwise independence.

On the other hand, the 1 in 17,000 bound that we get by assuming pairwise independence is a lot better than the bound that we would have if there were no independence at all. For example, if the markers are dependent, then it is possible that

everyone with marker \(E\) has marker \(A\),

everyone with marker \(A\) has marker \(B\),

everyone with marker \(B\) has marker \(C\), and

everyone with marker \(C\) has marker \(D\).

In such a scenario, the probability of a match is

\[\nonumber \text{Pr}[E] = \frac{1}{170}.\]

So a stronger independence assumption leads to a smaller bound on the probability of a match. The trick is to figure out what independence assumption is reasonable. Assuming that the markers are mutually independent may well not be reasonable unless you have examined hundreds of millions of blood samples. Otherwise, how would you know that marker \(D\) does not show up more frequently whenever the other four markers are simultaneously present?