6.20: Entropy

Last updated
Save as PDF

Page ID: 1872

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

Learning Objectives

Shannon showed the power of probabilistic models for symbolic-valued signals. The dey quantity that characterizes such a signal is the entropy of its alphabet.

Communication theory has been formulated best for symbolic-valued signals. Claude Shannon published in 1948 The Mathematical Theory of Communication, which became the cornerstone of digital communication. He showed the power of probabilistic models for symbolic-valued signals, which allowed him to quantify the information present in a signal. In the simplest signal model, each symbol can occur at index n with a probability Pr[a_k], k={1,...,K}. What this model says is that for each signal value a K-sided coin is flipped (note that the coin need not be fair). For this model to make sense, the probabilities must be numbers between zero and one and must sum to one.

\[0\leq Pr[a_{k}]\leq 1 \nonumber \]

\[\sum_{k=1}^{K}Pr[a_{k}]=1 \nonumber \]

This coin-flipping model assumes that symbols occur without regard to what preceding or succeeding symbols were, a false assumption for typed text. Despite this probabilistic model's over-simplicity, the ideas we develop here also work when more accurate, but still probabilistic, models are used. The key quantity that characterizes a symbolic-valued signal is the entropy of its alphabet.

\[H(A)=-\sum_{kK}Pr[a_{k}]\log_{2}Pr[a_{k}] \nonumber \]

Because we use the base-2 logarithm, entropy has units of bits. For this definition to make sense, we must take special note of symbols having probability zero of occurring. A zero-probability symbol never occurs; thus, we define

\[0\log_{2}0=0 \nonumber \]

so that such symbols do not affect the entropy. The maximum value attainable by an alphabet's entropy occurs when the symbols are equally likely

\[Pr[a_{k}]=Pr[a_{l}] \nonumber \]

In this case, the entropy equals log₂K. The minimum value occurs when only one symbol occurs; it has probability one of occurring and the rest have probability zero.

Exercise \(\PageIndex{1}\)

Derive the maximum-entropy results, both the numeric aspect (entropy equals log₂K) and the theoretical one (equally likely symbols maximize entropy). Derive the value of the minimum entropy alphabet.

Solution

Equally likely symbols each have a probability of 1/K. Thus,

\[H(A)=-\sum_{kK}\frac{1}{K}\log_{2}\frac{1}{K}=\log_{2}K \nonumber \]

To prove that this is the maximum-entropy probability assignment, we must explicitly take into account that probabilities sum to one. Focus on a particular symbol, say the first. Pr[a₀] appears twice in the entropy formula: the terms

\[Pr[a_{0}]\log_{2}Pr[a_{0}]\; and\; (1-Pr[a_{0}]+...+Pr[a_{K-2}])\log_{2}(1-Pr[a_{0}]+...+Pr[a_{K-2}]) \nonumber \]

The derivative with respect to this probability (and all the others) must be zero. The derivative equals

\[\log_{2}Pr[a_{0}]- \log_{2}(1-Pr[a_{0}]+...+Pr[a_{K-2}]) \nonumber \]

and all other derivatives have the same form (just substitute your letter's index). Thus, each probability must equal the others, and we are done. For the minimum entropy answer, one term is

\[1\log_{2}1=0 \nonumber \]

and the others are

\[0\log_{2}0 \nonumber \]

which we define to be zero also. The minimum value of entropy is zero.

Example \(\PageIndex{1}\):

A four-symbol alphabet has the following probabilities.

\[Pr[a_{0}]=\frac{1}{2} \nonumber \]

\[Pr[a_{1}]=\frac{1}{4} \nonumber \]

\[Pr[a_{2}]=\frac{1}{8} \nonumber \]

\[Pr[a_{3}]=\frac{1}{8} \nonumber \]

Note that these probabilities sum to one as they should. As

\[\frac{1}{2}=2^{-1},\log_{2}\frac{1}{2}=-1 \nonumber \]

The entropy of this alphabet equals

\[H(A)=-\left ( \frac{1}{2} \log_{2}\frac{1}{2}+\frac{1}{4} \log_{2}\frac{1}{4}+\frac{1}{8} \log_{2}\frac{1}{8}+\frac{1}{8} \log_{2}\frac{1}{8}\right ) \nonumber \]

\[H(A)=-\left ( \frac{1}{2} -1+\frac{1}{4} -2+\frac{1}{8} -3+\frac{1}{8} -3\right ) \nonumber \]

\[H(A)=1.75\; bits \nonumber \]