3.7: Summary

Last updated
Save as PDF

Page ID: 75837

Robert Gallager
Massachusetts Institute of Technology via MIT OpenCourseWare

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

This chapter has developed the basic results about finite-state Markov chains. It was shown that the states of any finite-state chain can be partitioned into classes, where each class is either transient or recurrent, and each class is periodic or aperiodic. If a recurrent class is periodic of period \(d\), then the states in that class can be partitioned into \(d\) subsets where each subset has transitions only into the next subset.

The transition probabilities in the Markov chain can be represented as a matrix \([P]\), and the \(n\)-step transition probabilities are given by the matrix product \(\left[P^{n}\right]\). If the chain is ergodic, i.e., one aperiodic recurrent class, then the limit of the \(n\)-step transition probabilities become independent of the initial state, i.e., \(\lim _{n \rightarrow \infty} P_{i j}^{n}=\pi_{j}\) where \(\boldsymbol{\pi}=\left(\pi_{1}, \ldots, \pi_{\mathrm{M}}\right)\) is called the steady-state probability. Thus the limiting value of \(\left[P^{n}\right]\) is an M by M matrix whose rows are all the same, i.e., the limiting matrix is the product \(\boldsymbol{e \pi}\). The steady state probabilities are uniquely specified by \(\sum_{j} \pi_{i} P_{i j}=\pi_{j}\) and \(\sum_{i} \pi_{i}=1\). That unique solution must satisfy \(\pi_{i}>0\) for all \(i\). The same result holds (see Theorem 3.3.2) for aperidodic unichains with the exception that \(\pi_{i}=0\) for all transient states.

The eigenvalues and eigenvectors of \([P]\) are useful in many ways, but in particular provide precise results about how \(P_{i j}^{n}\) approaches \(\pi_{j}\) with increasing \(n\). An eigenvalue equal to 1 always exists, and its multiplicity is equal to the number of recurrent classes. For each recurrent class, there is a left eigenvector \(\boldsymbol{\pi}\) of eigenvalue 1. It is the steady-state vector for the given recurrent class. If a recurrent class is periodic with period \(d\), then there are \(d\) corresponding eigenvalues of magnitude 1 uniformly spaced around the unit circle. The left eigenvector corresponding to each is nonzero only on that periodic recurrent class.

All other eigenvalues of \([P]\) are less than 1 in magnitude. If the eigenvectors of the entire set of eigenvalues span M dimensional space, then \(\left[P^{n}\right]\) can be represented by \ref{3.30} which shows explicitly how steady state is approached for aperiodic recurrent classes of states. If the eigenvectors do not span M-space, then \ref{3.30} can be replaced by a Jordan form.

For an arbitrary finite-state Markov chain, if the initial state is transient, then the Markov chain will eventually enter a recurrent state, and the probability that this takes more than \(n\) steps approaches zero geometrically in \(n\); Exercise 3.18 shows how to find the probability that each recurrent class is entered. Given an entry into a particular recurrent class, the results about recurrent chains can be used to analyze the behavior within that class.

The results about Markov chains were extended to Markov chains with rewards. The use of reward functions (or cost functions) provides a systematic way to approach a large class of problems ranging from first-passage times to dynamic programming. For unichains, the key result here is Theorem 3.5.2, which provides both an exact expression and an asymptotic expression for the expected aggregate reward over n stages. Markov chains with rewards and multiple recurrent classes are best handled by considering the individual recurrent classes separately.

Finally, the results on Markov chains with rewards were used to understand Markov decision theory. The Bellman dynamic programming algorithm was developed, and the policy improvement algorithm was discussed and analyzed. Theorem 3.6.2 demonstrated the relationship between the optimal dynamic policy and the optimal stationary policy. This section provided only an introduction to dynamic programming and omitted all discussion of discounting (in which future gain is considered worth less than present gain because of interest rates). The development was also restricted to finite-state spaces.

For a review of vectors, matrices, and linear algebra, see any introductory text on linear algebra such as Strang [19]. For further reading on Markov decision theory and dynamic programming, see Bertsekas, [3]. Bellman [1] is of historic interest and quite readable.