7.1: Probability as degree of belief - Bayesian probability

Last updated
Save as PDF

Page ID: 24121

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

The essential concept in using probability to simplify the world is that probability is a degree of belief. Therefore, a probability is based on our knowledge, and it changes when our knowledge changes.

7.1.1 Is it my telephone number?

Here is an example from soon after I had moved to England. I was talking to a friend on the phone, of the old-fashioned variety with wires connecting it to the wall. David needed to call me back. However, having just moved to the apartment, I was unsure of my phone number; plus, for anyone used to American phone numbers, British phone numbers have a strange and hard-to-remember format. I had a reasonably likely guess, which I gave David so that he could call me back. After I hung up, I tested my guess by picking up my phone and dialing my guess—and got a busy signal.

Given this experimental evidence, how sure am I that the candidate number is my phone number? Quantitatively, what odds should I give?

This question makes no sense if probability is seen as long-run frequency. In that view, the probability of a coin turning up heads is 1/2 because 1/2 is the limiting proportion of heads in an ever-longer series of tosses. However, for evaluating the plausibility of the phone number, this interpretation—called the frequentist interpretation—cannot apply, because there is no repeated experiment.

The frequentist interpretation gets stuck because it places probability in the physical system itself. The alternative—that probability reflects the incompleteness of our knowledge—is known as the Bayesian interpretation of probability. It is the interpretation suited for mastering complexity. A book-length discussion and application of this fundamental point is Edwin Jaynes’s Probability Theory: The Logic of Science [26].

The Bayesian interpretation is based on one simple idea: A probability reflects our degree of belief in a hypothesis. Probabilities are therefore subjective: Someone with different knowledge will have different probabilities. Thus, by collecting evidence, our degrees of belief change. Evidence changes probabilities.

In the phone-number problem, what is the hypothesis and what is the evidence?

The hypothesis—often denoted H—is the statement about the world whose credibility we would like to judge. Here,

\[H \equiv \textrm{My phone-number guess is correct}.\]

The evidence—often denoted E or D (for data)—is the information that we collect, obtain, or learn and then use to judge the hypothesis. It augments our knowledge. Here, E is the result of the experiment:

\[E \equiv \textrm{Dialing my guess gave a busy signal}.\]

Any hypothesis has an initial probability Pr (H). This probability is called the prior probability, because it is the probability prior to, or before, incorporating the evidence. After learning the evidence E, the hypothesis has a new probability Pr (H ∣ E): the probability of the hypothesis H given—that is, upon assuming—the evidence E. This probability is called the posterior probability, because it is the probability, or degree of belief, after including the evidence.

The recipe for using evidence to update probabilities is Bayes’ theorem:

\[\textrm{Pr} (H \vert E) \propto \textrm{Pr} (H) \times \textrm{Pr} (E \vert H).\]

The new factor, the probability Pr (E ∣ H)—the probability of the evidence given the hypothesis—is called the likelihood. It measures how well the candidate theory (the hypothesis) explains the evidence. Bayes’ theorem then says that

\[\underbrace{\textrm{posterior probability}}_{\textrm{Pr}(H \vert E)} \propto \underbrace{\textrm{prior probability}}_{\textrm{Pr}(H)} \times \underbrace{\textrm{explanatory power}}_{\textrm{Pr}(E \vert H)}.\]

(The constant of proportionality is chosen so that the posterior probabilities for all the competing hypotheses add to 1.) Both probabilities on the right are necessary. Without the likelihood, we could not change our probabilities. Without the prior probability, we would always prefer the hypothesis with the maximum likelihood, no matter how contrived or post hoc.

In a frequent use of Bayes’ theorem, there are only two hypotheses, H and its negation \(\bar{H}\). In this problem, \(\bar{H}\) is the statement that my guess is wrong. With only two hypotheses, a compact form of Bayes’ theorem uses odds instead of probabilities, thereby avoiding the constant of proportionality:

\[\underbrace{\textrm{posterior odds}}_{O(H \vert E)} = \underbrace{\textrm{prior odds}}_{O(H)} \times \frac{\textrm{Pr}(E \vert H)}{\textrm{Pr}(E \vert \bar{H})}.\]

The odds O are related to the probability p by \(O = p/(1-p)\). For example, a probability of p = 2/3 corresponds to an odds of 2—often written as 2:1 and read as “2-to-1 odds.”

Exercise \(\PageIndex{1}\): Converting probabilities to odds

Convert the following probabilities to odds: (a) 0.01, (b) 0.9, (c) 0.75, and (d) 0.3.

Exercise \(\PageIndex{2}\): Converting odds to probabilities

Convert the following odds to probabilities: (a) 3, (b) 1/3, (c) 1:9, and (d) 4-to-1.

The ratio Pr (E ∣ H)/Pr (E ∣ \(\bar{H}\)) is called the likelihood ratio. Its numerator measures how well the hypothesis H explains the evidence E; its denominator measures how well the contrary hypothesis \(\bar{H}\) explains the same evidence. So their ratio measures the relative explanatory power of the two hypotheses. Bayes’ theorem, in the odds form, is simple:

\[\textrm{updated odds} = \textrm{initial odds} \times \textrm{relative explanatory power}.\]

Let’s use Bayes’ theorem to judge my phone-number guess. Before the experiment, I was not too sure of the phone number; Pr (H) is perhaps 1/2, making O(H) = 1. In the likelihood ratio, the numerator Pr (E ∣ H) is the probability of getting a busy signal assuming (“given”) that my guess is correct. Because I would be dialing my own phone using my phone, I would definitely get a busy signal. Thus, Pr (E ∣ H) = 1: The hypothesis of a correct guess (H) explains the data as well as possible.

The trickier estimate is the denominator Pr (E∣ \(\bar{H}\)): the probability of getting a busy signal assuming that my guess is incorrect. I’ll assume that my guess is still a valid phone number (I nowadays rarely get the recorded message saying that I have dialed an invalid number). Then I would be dialing a random person’s phone. Thus, Pr (E ∣ \(\bar{H}\)) is the probability that a random valid phone is busy. It is probably similar to the fraction of the day that my own phone is busy. In my household, the phone is in use for 0.5 hours in a 24-hour day, and the busy fraction could be 0.5/24.

However, that estimate uses an overly long time, 24 hours, for the denominator. If I do the experiment at 3 am and my guess is wrong, I would wake up an innocent bystander. Furthermore, I am not often on the phone at 3 am. A more reasonable denominator is 10 hours (9 am to 7 pm), making the busy fraction and the likelihood Pr (E ∣ \(\bar{H}\)) roughly 0.05. An incorrect guess (\(bar{H}\)) is a lousy explanation for the data.

The relative explanatory power of H and \(\bar{H}\), which is measured by the likelihood ratio, is roughly 20:

\[\frac{\textrm{Pr}(E \vert H)}{\textrm{Pr} (E \vert \bar{H})} \sim \frac{1}{0.05} = 20.\]

Because the prior odds were 1 to 1, the updated, posterior odds are 20 to 1:

\[\underbrace{\textrm{posterior odds}}_{O(H \vert E) \sim 20} = \underbrace{\textrm{prior odds}}_{O(H) \sim 1} \times \underbrace{\textrm{likelihood ratio}}_{\textrm{Pr}(E \vert H) / \textrm{Pr}(E \vert \bar{H}) \sim 20} \sim 20.\]

Exercise \(\PageIndex{3}\): PKU testing

In most American states and many countries, newborn babies are tested for the metabolic defect phenylketonuria (PKU). The prior odds of having PKU are about 1 in 10 000. The test gives a false-positive result 0.23 percent of the time; it gives a false-negative result 0.3 percent of the time. What are Pr (PKU ∣ positive test) and Pr (PKU ∣ negative test)?