8.2.5: Maximum Entropy, Analytic Form

Last updated
Save as PDF

Page ID: 51675

Paul Penfield, Jr.
Massachusetts Institute of Technology via MIT OpenCourseWare

$ \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } $ $ \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} $$\newcommand{\id}{\mathrm{id}}$ $ \newcommand{\Span}{\mathrm{span}}$ $ \newcommand{\kernel}{\mathrm{null}\,}$ $ \newcommand{\range}{\mathrm{range}\,}$ $ \newcommand{\RealPart}{\mathrm{Re}}$ $ \newcommand{\ImaginaryPart}{\mathrm{Im}}$ $ \newcommand{\Argument}{\mathrm{Arg}}$ $ \newcommand{\norm}[1]{\| #1 \|}$ $ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$ $ \newcommand{\Span}{\mathrm{span}}$ $\newcommand{\id}{\mathrm{id}}$ $ \newcommand{\Span}{\mathrm{span}}$ $ \newcommand{\kernel}{\mathrm{null}\,}$ $ \newcommand{\range}{\mathrm{range}\,}$ $ \newcommand{\RealPart}{\mathrm{Re}}$ $ \newcommand{\ImaginaryPart}{\mathrm{Im}}$ $ \newcommand{\Argument}{\mathrm{Arg}}$ $ \newcommand{\norm}[1]{\| #1 \|}$ $ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$ $ \newcommand{\Span}{\mathrm{span}}$$\newcommand{\AA}{\unicode[.8,0]{x212B}}$

Here we demonstrate the Principle of Maximum Entropy for the simple case in which there is one constraint and three variables. It will be possible to go through all the steps analytically.

Suppose you have been hired by Carnivore Corporation, the parent company of Berger’s Burgers, to analyze their worldwide sales. You visit Berger’s Burgers restaurants all over the world, and determine that, on average, people are paying $1.75 for their meals. (As part of Carnivore’s commitment to global homogeneity, the price of each meal is exactly the same in every restaurant, after local currencies are converted to U.S. dollars.)

After you return, your supervisors ask about the probabilities of a customer ordering each of the three value meals. In other words, they want to know $p(B)$, $p(C)$, and $p(F)$. You are horrified to realize that you did not keep the original data, and there is no time to repeat your trip. You have to make the best estimate of the probabilities $p(B)$, $p(C)$, and $p(F)$ consistent with the two things you do know:

$1 = p(B) + p(C) + p(F) \tag{8.17}$

$$1.75 = $1.00p(B) + $2.00p(C) + $3.00p(F) \tag{8.18}$

Since you have three unknowns and only two equations, there is not enough information to solve for the unknowns.

What should you do? There are a range of values of the probabilities that are consistent with what you know. However, these leave you with different amounts of uncertainty $S$

$S = p(B) \log_2 \Big(\dfrac{1}{p(B)}\Big) + p(C) \log_2 \Big(\dfrac{1}{p(C)}\Big) + p(F) \log_2 \Big(\dfrac{1}{p(F)}\Big) \tag{8.19}$

If you choose one for which $S$ is small, you are assuming something you do not know. For example, if your average had been $2.00 rather than $1.75, you could have met both of your constraints by assuming that everybody bought the chicken meal. Then your uncertainty would have been 0 bits. Or you could have assumed that half the orders were for burgers and half for fish, and the uncertainty would have been 1 bit. Neither of these assumptions seems particularly appropriate, because each goes beyond what you know. How can you find that probability distribution that uses no further assumptions beyond what you already know?

The Principle of Maximum Entropy is based on the reasonable assumption that you should select that probability distribution which leaves you the largest remaining uncertainty (i.e., the maximum entropy) consistent with your constraints. That way you have not introduced any additional assumptions into your calculations.

For the simple case of three probabilities and two constraints, this is easy to do analytically. Working with the two constraints, two of the unknown probabilities can be expressed in terms of the third. For our case we can multiply Equation 8.17 above by $1.00 and subtract it from Equation 8.18, to eliminate $p(B)$. Then we can multiply the first by $2.00 and subtract it from the second, thereby eliminating $p(C)$:

$p(C) = 0.75 − 2p(F) \tag{8.20}$

$p(B) = 0.25 + p(F) \tag{8.21}$

Next, the possible range of values of the probabilities can be determined. Since each of the three lies between 0 and 1, it is easy to conclude from these results that

$0 ≤ p(F) ≤ 0.375 \tag{8.22}$

$0 ≤ p(C) ≤ 0.75 \tag{8.23}$

$0.25 ≤ p(B) ≤ 0.625 \tag{8.24}$

Next, these expressions can be substituted into the formula for entropy so that it is expressed in terms of a single probability. Thus