9.6.1: Dual Variable

Last updated
Save as PDF

Page ID: 51694

Paul Penfield, Jr.
Massachusetts Institute of Technology via MIT OpenCourseWare

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

Sometimes a problem is clarified by looking at a more general problem of which the original is a special case. In this case, rather than focusing on a specific value of \(G\), let’s look at all possible values of \(G\), which means the range between the smallest and largest values of \(g(A_i)\). Thus \(G\) becomes a variable rather than a known value (the known value will continue to be denoted \(\widetilde{G}\) here). Then rather than express things in terms of \(G\) as an independent variable, we will introduce a new dual variable, which we will call \(\beta\), and express all the quantities of interest, including \(G\), in terms of it. Then the original problem reduces to finding the value of \(\beta\) which corresponds to the known, desired value \(\widetilde{G}\), i.e., the value of \(\beta\) for which \(G(\beta) = \widetilde{G}\).

The new variable \(\beta\) is known as a Lagrange Multiplier, named after the French mathematician JosephLouis Lagrange (1736–1813)\(^1\). Lagrange developed a general technique, using such variables, to perform constrained maximization, of which our current problem is a very simple case. We will not use the mathematical technique of Lagrange Multipliers—it is more powerful and more complicated than we need.

Here is what we will do instead. We will start with the answer, which others have derived using Lagrange Multipliers, and prove that it is correct. That is, we will give a formula for the probability distribution \(p(A_i)\) in terms of the \(\beta\) and the \(g(A_i)\) parameters, and then prove that the entropy calculated from this distribution, \(S(\beta)\) is at least as large as the entropy of any probability distribution that has the same expected value for \(G\), namely \(G(\beta)\). Therefore the use of \(\beta\) automatically maximizes the entropy. Then we will show how to find the value of \(\beta\), and therefore indirectly all the quantities of interest, for the particular value \(\widetilde{G}\) of interest (this will be possible because \(G(\beta)\) is a monotonic function of \(\beta\) so calculating its inverse can be done with zero-finding techniques).

\(^1\)See a biography of Lagrange at http://www-groups.dcs.st-andrews.ac.uk/~history/Biographies/Lagrange.html