9.6.3: The Maximum Entropy

Last updated
Save as PDF

Page ID: 51696

Paul Penfield, Jr.
Massachusetts Institute of Technology via MIT OpenCourseWare

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

It is easy to show that the entropy calculated from this probability distribution is at least as large as that for any probability distribution which leads to the same expected value of \(G\).

Recall the Gibbs inequality, Equation 6.4, which will be rewritten here with \(p(A_i)\) and \(p'(A_i)\) interchanged (it is valid either way):

\(\displaystyle \sum_{i} p'(A_i) \log_2 \Big(\dfrac{1}{p'(A_i)}\Big) \leq \displaystyle \sum_{i} p'(A_i) \log_2 \Big(\dfrac{1}{p(A_i)}\Big) \tag{9.16}\)

where \(p'(A_i)\) is any probability distribution and \(p(A_i)\) is any other probability distribution. The inequality is an equality if and only if the two probability distributions are the same.

The Gibbs inequality can be used to prove that the probability distribution of Equation 9.12 has the maximum entropy. Suppose there is another probability distribution \(p'(A_i)\) that leads to an expected value \(G'\) and an entropy \(S'\), i.e.,

\(\begin{align*} 1 &= \displaystyle \sum_{i} p'(A_i) \tag{9.17} \\ G' &= \displaystyle \sum_{i} p'(A_i)g(A_i) \tag{9.18} \\ S' &= \displaystyle \sum_{i} p'(A_i) \log_2 \Big(\dfrac{1}{p'(A_i)}\Big) \tag{9.19} \end{align*}\)

Then it is easy to show that, for any value of \(\beta\), if \(G' = G(\beta)\) then \(S' ≤ S(\beta)\):

\begin{align*}
S^{\prime} &=\sum_{i} p^{\prime}(A_{i}) \log _{2}(\frac{1}{p^{\prime}(A_{i})}) \\
& \leq \sum_{i} p^{\prime}(A_{i}) \log _{2}(\frac{1}{p(A_{i})}) \\
&=\sum_{i} p^{\prime}(A_{i})[\alpha+\beta g(A_{i})] \\
&=\alpha+\beta G^{\prime} \\
&=S(\beta)+\beta[G^{\prime}-G(\beta)]
\tag{9.20} \end{align*}

where Equations 9.16, 9.13, 9.17, 9.18, and 9.15 were used. Thus the entropy associated with any alternative proposed probability distribution that leads to the same value for the constraint variable cannot exceed the entropy for the distribution that uses \(\beta\).