7.6.1: Notation

Last updated
Save as PDF

Page ID: 51501

Paul Penfield, Jr.
Massachusetts Institute of Technology via MIT OpenCourseWare

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

Different authors use different notation for the quantities we have here called \(I\), \(J\), \(L\), \(N\), and \(M\). In his original paper Shannon called the input probability distribution \(x\) and the output distribution \(y\). The input information \(I\) was denoted \(H(x)\) and the output information \(J\) was \(H(y)\). The loss \(L\) (which Shannon called “equivocation”) was denoted \(H_y(x)\) and the noise \(N\) was denoted \(H_x(y)\). The mutual information \(M\) was denoted \(R\). Shannon used the word “entropy” to refer to information, and most authors have followed his lead.

Frequently information quantities are denoted by \(I\), \(H\), or \(S\), often as functions of probability distributions, or “ensembles.” In physics entropy is often denoted \(S\).

Another common notation is to use \(A\) to stand for the input probability distribution, or ensemble, and \(B\) to stand for the output probability distribution. Then \(I\) is denoted \(I(A)\), \(J\) is \(I(B)\), \(L\) is \(I(A\;|\;B)\), \(N\) is \(I(B\;|\;A)\), and \(M\) is \(I(A; B)\). If there is a need for the information associated with \(A\) and \(B\) jointly (as opposed to conditionally) it can be denoted \(I(A, B)\) or \(I(AB)\).