8.2: Principle of Maximum Entropy - Simple Form

Last updated
Save as PDF

Page ID: 50203

Paul Penfield, Jr.
Massachusetts Institute of Technology via MIT OpenCourseWare

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

In the last section, we discussed one technique of estimating the input probabilities of a process given that the output event is known. This technique, which relies on the use of Bayes’ Theorem, only works if the process is lossless (in which case the input can be identified with certainty) or an initial input probability distribution is assumed (in which case it is refined to take account of the known output).

The Principle of Maximum Entropy is a technique that can be used to estimate input probabilities more generally. The result is a probability distribution that is consistent with known constraints expressed in terms of averages, or expected values, of one or more quantities, but is otherwise as unbiased as possible (the word “bias” is used here not in the technical sense of statistics, but the everyday sense of a preference that inhibits impartial judgment). This principle is described first for the simple case of one constraint and three input events, in which case the technique can be carried out analytically. Then it is described more generally in Chapter 9.

This principle has applications in many domains, but was originally motivated by statistical physics, which attempts to relate macroscopic, measurable properties of physical systems to a description at the atomic or molecular level. It can be used to approach physical systems from the point of view of information theory, because the probability distributions can be derived by avoiding the assumption that the observer has more information than is actually available. Information theory, particularly the definition of information in terms of probability distributions, provides a quantitative measure of ignorance (or uncertainty, or entropy) that can be maximized mathematically to find the probability distribution that best avoids unnecessary assumptions.

This approach to statistical physics was pioneered by Edwin T. Jaynes (1922–1998), a professor at Washington University in St. Louis, and previously Stanford University. The seminal publication was

E. T. Jaynes, “Information Theory and Statistical Mechanics,” Physical Review, vol. 106, no. 4, pp. 620-630; May 15, 1957. (http://bayes.wustl.edu/etj/articles/theory.1.pdf)

Other references of interest by Jaynes include:

a continuation of this paper, E. T. Jaynes, “Information Theory and Statistical Mechanics. II,” Physical Review, vol. 108, no. 2, pp. 171-190; October 15, 1957. (http://bayes.wustl.edu/etj/articles/theory.1.pdf)
a review paper, including an example of estimating probabilities of an unfair die, E. T. Jaynes, “Information Theory and Statistical Mechanics,” pp. 181-218 in ”Statistical Physics,” Brandeis Summer Institute 1962, W. A. Benjamin, Inc., New York, NY; 1963. (http://bayes.wustl.edu/etj/articles/brandeis.pdf)
personal history of the approach, Edwin T. Jaynes, “Where Do We Stand on Maximum Entropy?,” pp. 15-118, in ”The Maximum Entropy Formalism,” Raphael D. Levine and Myron Tribus, editors, The MIT Press, Cambridge, MA; 1979. (http://bayes.wustl.edu/etj/articles/...on.entropy.pdf)

The philosophy of assuming maximum uncertainty as an approach to thermodynamics is discussed in

Chapter 3 of M. Tribus, “Thermostatics and Thermodynamics,” D. Van Nostrand Co, Inc., Princeton, NJ; 1961.

Before the Principle of Maximum Entropy can be used the problem domain needs to be set up. In cases involving physical systems, this means that the various states in which the system can exist need to be identified, and all the parameters involved in the constraints known. For example, the energy, electric charge, and other quantities associated with each of the states is assumed known. Often quantum mechanics is needed for this task. It is not assumed in this step which particular state the system is in (or, as often expressed, which state is actually “occupied”); indeed it is assumed that we do not know and cannot know this with certainty, and so we deal instead with the probability of each of the states being occupied. Thus we use probability as a means of coping with our lack of complete knowledge. Naturally we want to avoid inadvertently assuming more knowledge than we actually have, and the Principle of Maximum Entropy is the technique for doing this. In the application to nonphysical systems, the various events (possible outcomes) have to be identified along with various numerical properties associated with each of the events. In these notes we will derive a simple form of the Principle of Maximum Entropy and apply it to the restaurant example set up in Section 8.1.3.