6.2.1: Gibbs Inequality

Last updated
Save as PDF

Page ID: 51016

Paul Penfield, Jr.
Massachusetts Institute of Technology via MIT OpenCourseWare

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

Here we present the Gibbs Inequality, named after the American physicist J. Willard Gibbs (1839–1903)\(^1\), which will be useful to us in later proofs. This inequality states that the entropy is smaller than or equal to any other average formed using the same probabilities but a different function in the logarithm. Specifically

\(\displaystyle \sum_{i} p(A_i)\log_2\Big(\dfrac{1}{p(A_i)}\Big) \leq \displaystyle \sum_{i} p(A_i)\log_2\Big(\dfrac{1}{p'(A_i)}\Big) \tag{6.4}\)

where \(p(A_i)\) is any probability distribution (we will use it for source events and other distributions) and \(p'(A_i)\) is any other probability distribution, or more generally any set of numbers such that

\(0 \leq p'(A_i) \leq 1 \tag{6.5}\)

and

\(\displaystyle \sum_{i} p'(A_i) \leq 1. \tag{6.6}\)

As is true for all probability distributions,

\(\displaystyle \sum_{i} p(A_i) = 1. \tag{6.7}\)

Equation 6.4 can be proven by noting that the natural logarithm has the property that it is less than or equal to a straight line that is tangent to it at any point, (for example the point \(x\) = 1 is shown in Figure 6.2). This property is sometimes referred to as concavity or convexity. Thus

and therefore, by converting the logarithm base \(e\) to the logarithm base 2, we have

Then

\[\begin{align*}
\displaystyle \sum_{i} p(A_i) \log_2 \Big(\dfrac{1}{p(A_i)}\Big) - \displaystyle \sum_{i} p(A_i) \log_2 \Big(\dfrac{1}{p'(A_i)}\Big) &= \displaystyle \sum_{i} p(A_i) \log_2 \Big(\dfrac{p'(A_i)}{p(A_i)}\Big) \\
&\leq \log_2 e \displaystyle \sum_{i} p(A_i) \begin{bmatrix} \dfrac{p'(A_i)}{p(A_i)} - 1 \end{bmatrix} \\
&= \log_2 e \Big(\displaystyle \sum_{i} p'(A_i) - \sum_{i} p(A_i)\Big) \\
&= \log_2 e \Big(\displaystyle \sum_{i} p'(A_i) - 1\Big) \\
&\leq 0 \tag{6.10}
\end{align*} \nonumber \]

Screen Shot 2021-05-05 at 8.22.21 PM.png — Figure 6.2: Graph of the inequality \(\ln x ≤ (x − 1)\)

\(^1\)See a biography of Gibbs at http://www-groups.dcs.st-andrews.ac....ies/Gibbs.html