3.4: The Eigenvalues and Eigenvectors of Stochastic Matrices

Last updated
Save as PDF

Page ID: 44612

Robert Gallager
Massachusetts Institute of Technology via MIT OpenCourseWare

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

For ergodic unichains, the previous section showed that the dependence of a state on the distant past disappears with increasing \(n, \text { i.e., } P_{i j}^{n} \rightarrow \pi_{j}\). In this section we look more carefully at the eigenvalues and eigenvectors of \([P]\) to sharpen our understanding of how fast \(\left[P^{n}\right]\) converges for ergodic unichains and what happens for other finite-state Markov chains.

Definition 3.4.1

The row⁶ vector \(\boldsymbol{\pi}\) is a left eigenvector of \([P]\) of eigenvalue \(\lambda\) if \(\boldsymbol{\pi \neq 0}\) and \(\boldsymbol{\pi[P]}=\boldsymbol{\lambda \pi}\), i.e., \(\sum_{i} \pi_{i} P_{i j}=\lambda \pi_{j}\) for all \(\text { j }\). The column vector \(\boldsymbol{\nu}\) is a right eigenvector of eigenvalue \(\lambda\) if \(\boldsymbol{\nu \neq 0}\) and \(\boldsymbol{[P] \nu}=\boldsymbol{\lambda \nu}\), i.e., \(\sum_{j} P_{i j} \nu_{j}=\lambda \nu_{i}\) for all \(i\).

We showed that a stochastic matrix always has an eigenvalue \(\lambda=1\), and that for an ergodic unichain, there is a unique steady-state vector \(\boldsymbol{\pi}\) that is a left eigenvector with \(\lambda=1\) and (within a scale factor) a unique right eigenvector \(\boldsymbol{e}=(1, \ldots, 1)^{\top}\). In this section we look at the other eigenvalues and eigenvectors and also look at Markov chains other than ergodic unichains. We start by limiting the number of states to M = 2. This provides insight without requiring much linear algebra. After that, the general case with arbitrary \(\mathrm{M}<\infty\) is analyzed.

Eigenvalues and eigenvectors for M = 2 states

The eigenvalues and eigenvectors can be found by elementary (but slightly tedious) algebra. The left and right eigenvector equations can be written out as

\[{\begin{aligned}
&\pi_{1} P_{11}+\pi_{2} P_{21}=\lambda \pi_{1}\\
&\pi_{1} P_{12}+\pi_{2} P_{22}=\lambda \pi_{2}
\end{aligned}} \quad(\text{left}) \quad\quad\quad\quad{\begin{aligned}
P_{11} \nu_{1}+P_{12} \nu_{2} &=\lambda \nu_{1} \\
P_{21} \nu_{1}+P_{22} \nu_{2} &=\lambda \nu_{2}
\end{aligned}} \quad\text{right} \label{3.27} \]

Each set of equations have a non-zero solution if and only if the matrix \([P-\lambda I]\), where \([I]\) is the identity matrix, is singular (i.e., there must be a non-zero \(\boldsymbol{\nu}\) for which \([P-\lambda I] \boldsymbol{\nu}=\boldsymbol{0})\). Thus \(\lambda\) must be such that the determinant of \([P-\lambda I]\), namely \(\left(P_{11}-\lambda\right)\left(P_{22}-\lambda\right)-P_{12} P_{21}\), is equal to 0. Solving this quadratic equation in \(\lambda\), we find that \(\lambda\) has two solutions,

\(\lambda_{1}=1 \quad\quad \lambda_{2}=1-P_{12}-P_{21}\)

Assuming initially that \(P_{12}\) and \(P_{21}\) are not both 0, the solution for the left and right eigenvectors, \(\boldsymbol{\pi}^{(1)}\) and \(\boldsymbol{\nu}^{(1)}\), of \(\lambda_{1}\) and \(\boldsymbol{\pi}^{(2)}\) and \(\boldsymbol{\nu}^{(2)}\) of \(\lambda_{2}\), are given by

\(\begin{aligned}
\pi_{1}^{(1)}&=\frac{P_{21}}{P_{12}+P_{21}} &\pi_{2}^{(1)}&=\frac{P_{12}}{P_{12}+P_{21}} & \nu_{1}^{(1)}&= 1 & \nu_{2}^{(1)}&=\quad 1 \\
\pi_{1}^{(2)}&= 1 \quad& \pi_{2}^{(2)}&=-1\quad & \nu_{1}^{(2)}&=\frac{P_{12}}{P_{12}+P_{21}} \quad &\nu_{2}^{(2)}&=\frac{-P_{21}}{P_{12}+P_{21}}
\end{aligned}\)

These solutions contain arbitrary normalization factors. That for \(\boldsymbol{\pi}^{(1)}=\left(\pi_{1}^{(1)}, \pi_{2}^{(1)}\right)\) has been chosen so that \(\boldsymbol{\pi}^{(1)}\) is a steady-state vector (i.e., the components sum to 1). The solutions have also been normalized so that \(\boldsymbol{\pi}_{i} \boldsymbol{\nu}_{i}=1\) for \(i=1,2\). Now define

\([\Lambda]=\left[\begin{array}{cc}
\lambda_{1} & 0 \\
0 & \lambda_{2}
\end{array}\right] \quad \text { and } \quad[U]=\left[\begin{array}{cc}
\nu_{1}^{(1)} & \nu_{1}^{(2)} \\
\nu_{2}^{(1)} & \nu_{2}^{(2)}
\end{array}\right]\)

i.e., \([U]\) is a matrix whose columns are the eigenvectors \(\boldsymbol{\nu}^{(1)}\) and \(\boldsymbol{\nu}^{(2)}\). Then the two right eigenvector equations in \ref{3.27} can be combined compactly as \([P][U]=[U][\Lambda]\). It turns out (for the given normalization of the eigenvectors) that the inverse of \([U]\) is just the matrix whose rows are the left eigenvectors of \([P]\) (this can be verified by noting that \(\boldsymbol{\pi}_{1} \boldsymbol{\nu}_{2}=\boldsymbol{\pi}_{2} \boldsymbol{\nu}_{1}=0\). We then see that \([P]=[U][\Lambda][U]^{-1}\) and consequently \(\left[P^{n}\right]=[U][\Lambda]^{n}[U]^{-1}\). Multiplying this out, we get

\[\left[P^{n}\right]=\left[\begin{array}{ll}
\pi_{1}+\pi_{2} \lambda_{2}^{n} & \pi_{2}-\pi_{2} \lambda_{2}^{n} \\
\pi_{1}-\pi_{1} \lambda_{2}^{n} & \pi_{2}+\pi_{1} \lambda_{2}^{n}
\end{array}\right]\label{3.28} \]

where \(\boldsymbol{\pi}=\left(\pi_{1}, \pi_{2}\right)\) is the steady state vector \(\boldsymbol{\pi}^{(1)}\). Recalling that \(\lambda_{2}=1-P_{12}-P_{21}\), we see that \(\left|\lambda_{2}\right| \leq 1\). There are 2 trivial cases where \(\left|\lambda_{2}\right|=1\). In the first, \(P_{12}=P_{21}=0\), so that \([P]\) is just the identity matrix. The Markov chain then has 2 recurrent states and stays forever where it starts. In the other trivial case, \(P_{12}=P_{21}=1\). Then \(\lambda_{2}=-1\) so that \(\left[P^{n}\right]\) alternates between the identity matrix for \(n\) even and \([P]\) for \(n\) odd. In all other cases, \(\left|\lambda_{2}\right|<1\) and \(\left[P^{n}\right]\) approaches the steady state matrix \(\lim _{n \rightarrow \infty}\left[P^{n}\right]=\boldsymbol{e} \boldsymbol{\pi}\).

What we have learned from this is the exact way in which \(\left[P^{n}\right]\) approaches \(\boldsymbol{e \pi}\). Each term in \(\left[P^{n}\right]\) approaches the steady state value exponentially in \(n\) as \(\lambda_{2}^{n}\). Thus, in place of the upper bound in (3.21), we have an exact expression, which in this case is simpler than the bound. As we see shortly, this result is representative of the general case, but the simplicity is lost.

Eigenvalues and eigenvectors for M > 2 states

For the general case of a stochastic matrix, we start with the fact that the set of eigenvalues is given by the set of (possibly complex) values of \(\lambda\) that satisfy the determinant equation \(\operatorname{det}[P-\lambda I]=0\). Since \(\operatorname{det}[P-\lambda I]\) is a polynomial of degree M in \(\lambda\), this equation has M roots (i.e., M eigenvalues), not all of which need be distinct.⁷

Case with M distinct eigenvalues: We start with the simplest case in which the M eigenvalues, say \(\lambda_{1}, \ldots, \lambda_{\mathrm{M}}\), are all distinct. The matrix \(\left[P-\lambda_{i} I\right]\) is singular for each \(i\), so there must be a right eigenvector \(\boldsymbol{\nu}^{(i)}\) and a left eigenvector \(\boldsymbol{\pi}^{(i)}\) for each eigenvalue \(\lambda_{i}\). The right eigenvectors span M dimensional space and thus the matrix \(U\) with columns \(\left(\boldsymbol{\nu}^{(1)}, \ldots, \boldsymbol{\nu}^{(\mathrm{M})}\right)\) is nonsingular. The left eigenvectors, if normalized to satisfy \(\boldsymbol{\pi}^{(i)} \boldsymbol{\nu}^{(i)}=1\) for each \(i\), then turn out to be the rows of \(\left[U^{-1}\right]\) (see Exercise 3.11). As in the two state case, we can then express \(\left[P^{n}\right]\) as

\[\left[P^{n}\right]=\left[U^{-1}\right)\left[\Lambda^{n}\right][U]\label{3.29} \]

where \(\Lambda\) is the diagonal matrix with terms \(\lambda_{1}, \ldots, \lambda_{\mathrm{M}}\).

If \(\Lambda\) is broken into the sum of M diagonal matrices,⁸ each with only a single nonzero element, then (see Exercise 3.11) \(\left[P^{n}\right]\) can be expressed as

\[\left[P^{n}\right]=\sum_{i=1}^{M} \lambda_{i}^{n} \boldsymbol{\nu}^{(i)} \boldsymbol{\pi}^{(i)}\label{3.30} \]

Note that this is the same form as (3.28), where in (3.28), the eigenvalue \(\lambda_{1}=1\) simply appears as the value 1. We have seen that there is always one eigenvalue that is 1, with an accompanying steady-state vector \(\boldsymbol{\pi}\) as a left eigenvector and the unit vector \(\boldsymbol{e}=(1, \ldots, 1)^\boldsymbol{\top}\) as a right eigenvector. The other eigenvalues and eigenvectors can be complex, but it is almost self evident from the fact that \(\left[P^{n}\right]\) is a stochastic matrix that \(\left|\lambda_{i}\right| \leq 1\). A simple guided proof of this is given in Exercise 3.12.

We have seen that \(\lim _{n \rightarrow \infty}\left[P^{n}\right]=\boldsymbol{e} \boldsymbol{\pi}\) for ergodic unichains. This implies that all terms except \(i=1\) in \ref{3.30} die out with \(n\), which further implies that \(\left|\lambda_{i}\right|<1\) for all eigenvalues except \(\lambda=1\). In this case, we see that the rate at which \(\left[P^{n}\right]\) approaches steady state is given by the second largest eigenvalue in magnitude, i.e., \(\max _{i:\left|\lambda_{i}\right|<1}\left|\lambda_{i}\right|\).

If a recurrent chain is periodic with period \(d\), it turns out that there are \(d\) eigenvalues of magnitude 1, and these are uniformly spaced around the unit circle in the complex plane. Exercise 3.19 contains a guided proof of this.

Case with repeated eigenvalues and M linearly independent eigenvectors: If some of the M eigenvalues of \([P]\) are not distinct, the question arises as to how many linearly independent left (or right) eigenvectors exist for an eigenvalue \(\lambda_{i}\) of multiplicity m, i.e., a \(\lambda_{i}\) that is an mth order root of \(\operatorname{det}[P-\lambda I]\). Perhaps the ugliest part of linear algebra is the fact that an eigenvalue of multiplicity m need not have m linearly independent eigenvectors.

An example of a very simple Markov chain with M = 3 but only two linearly independent eigenvectors is given in Exercise 3.14. These eigenvectors do not span M-space, and thus the expansion in \ref{3.30} cannot be used.

Before looking at this ugly case, we look at the case where the right eigenvectors, say, span the space, i.e., where each distinct eigenvalue has a number of linearly independent eigenvectors equal to its multiplicity. We can again form a matrix \([U]\) whose columns are the M linearly independent right eigenvectors, and again \(\left[U^{-1}\right]\) is a matrix whose rows are the corresponding left eigenvectors of \([P]\). We then get \ref{3.30} again. Thus, so long as the eigenvectors span the space, the asymptotic expression for the limiting transition probabilities can be found in the same way.

The most important situation where these repeated eigenvalues make a major difference is for Markov chains with \(\kappa>1\) recurrent classes. In this case, \(k\) is the multiplicity of the eigenvalue 1. It is easy to see that there are \(\kappa\) different steady-state vectors. The steadystate vector for recurrent class \(\ell, 1 \leq \ell \leq \kappa\), is strictly positive for each state of the `th recurrent class and is zero for all other states.

The eigenvalues for \([P]\) in this case can be found by finding the eigenvalues separately for each recurrent class. If class \(j\) contains \(r_{j}\) states, then \(r_{j}\) of the eigenvalues (counting repetitions) of \([P]\) are the eigenvalues of the \({r}_{j}\) by \(r_{j}\) matrix for the states in that recurrent class. Thus the rate of convergence of \(\left[P^{n}\right]\) within that submatrix is determined by the second largest eigenvalue (in magnitude) in that class.

What this means is that this general theory using eigenvalues says exactly what common sense says: if there are \(k\). recurrent classes, look at each one separately, since they have nothing to do with each other. This also lets us see that for any recurrent class that is aperiodic, all the other eigenvalues for that class are strictly less than 1 in magnitude.

The situation is less obvious if there are \(k\) recurrent classes plus a set of \(t\) transient states. All but \(t\) of the eigenvalues (counting repetitions) are associated with the recurrent classes, and the remaining \(t\) eigenvalues are the eigenvalues of the \(t\) by \(t\) matrix, say \(\left[P_{t}\right]\), between the transient states. Each of these \(t\) eigenvalues are strictly less than 1 (as seen in Section 3.3.4) and neither these eigenvalues nor their eigenvectors depend on the transition probabilities from the transient to recurrent states. The left eigenvectors for the recurrent classes also do not depend on these transient to recurrent states. The right eigenvector for \(\lambda=1\) for each recurrent class \(\mathcal{R}_{\ell}\) is very interesting however. It’s value is 1 for each state in \(\mathcal{R}_{\ell}\), is 0 for each state in the other recurrent classes, and is equal to \(\lim _{n \rightarrow \infty} \operatorname{Pr}\left\{X_{n} \in \mathcal{R}_{\ell} \mid X_{0}=i\right\}\) for each transient state i (see Exercise 3.13).

The Jordan form case: As mentioned before, there are cases in which one or more eigenvalues of \([P]\) are repeated (as roots of \(\operatorname{det}[P-\lambda I]\)) but where the number of linearly independent right eigenvectors for a given eigenvalue is less than the multiplicity of that eigenvalue. In this case, there are not enough eigenvectors to span the space, so there is no M by M matrix whose columns are linearly independent eigenvectors. Thus \([P]\) can not be expressed as \(\left[U^{-1}\right][\Lambda][U]\) where \(\Lambda\) is the diagonal matrix of the eigenvalues, repeated according to their multiplicity.

The Jordan form is the cure for this unfortunate situation. The Jordan form for a given \([P]\) is the following modification of the diagonal matrix of eigenvalues: we start with the diagonal matrix of eigenvalues, with the repeated eigenvalues as neighboring elements. Then for each missing eigenvector for a given eigenvalue, a 1 is placed immediately to the right and above a neighboring pair of appearances of that eigenvalue, as seen by example⁹ below:

\([J]=\left[\begin{array}{ccccc}
\lambda_{1} & 1 & 0 & 0 & 0 \\
0 & \lambda_{1} & 0 & 0 & 0 \\
0 & 0 & \lambda_{2} & 1 & 0 \\
0 & 0 & 0 & \lambda_{2} & 1 \\
0 & 0 & 0 & 0 & \lambda_{2}
\end{array}\right]\)

There is a theorem in linear algebra that says that an invertible matrix \([U]\) exists and a Jordan form exists such that \([P]=\left[U^{-1}\right][J][U]\). The major value to us of this result is that it makes it relatively easy to calculate \([J]^{n}\) for large \(n\) (see Exercise 3.15). This exercise also shows that for all stochastic matrices, each eigenvalue of magnitude 1 has precisely one associated eigenvector. This is usually expressed by the statement that all the eigenvalues of magnitude 1 are simple, meaning that their multiplicity equals their number of linearly independent eigenvectors. Finally the exercise shows that \(\left[P^{n}\right]\) for an aperiodic recurrent chain converges as a polynomial¹⁰ in \(n\) times \(\lambda_{s}^{n}\) where \(\lambda_{s}\) is the eigenvalue of largest magnitude less than 1.

The most important results of this section on eigenvalues and eigenvectors can be summarized in the following theorem.

Theorem 3.4.1

The transition matrix of a finite state unichain has a single eigenvalue \(\lambda=1\) with an accompanying left eigenvector \(\boldsymbol{\pi}\) satisfying \ref{3.9} and a left eigenvector \(\boldsymbol{e}=(1,1, \ldots, 1)^\mathrm{T}\). The other eigenvalues \(\lambda_{i}\) all satisfy \(\left|\lambda_{i}\right| \leq 1\). The inequality is strict unless the unichain is periodic, say with period \(d\), and then there are \(d\) eigenvalues of magnitude 1 spaced equally around the unit circle. If the unichain is ergodic, then \(\left[P^{n}\right]\) converges to steady state \(\boldsymbol{e\pi}\) with an error in each term bounded by a fixed polynomial in \(n\) times \(\left|\lambda_{s}\right|^{n}\), where \(\lambda_{s}\) is the eignevalue of largest magnitude less than 1.

Arbitrary Markov chains can be split into their recurrent classes, and this theorem can be applied separately to each class.

Reference

⁶Students of linear algebra usually work primarily with right eigenvectors (and in abstract linear algebra often ignore matrices and concrete M-tuples altogether). Here a more concrete view is desirable because of the direct connection of \(\left[P^{n}\right]\) with transition probabilities. Also, although left eigenvectors could be converted to right eigenvectors by taking the transpose of \([P]\), this would be awkward when Markov chains with rewards are considered and both row and column vectors play important roles.

⁷Readers with little exposure to linear algebra can either accept the linear algebra results in this section (without a great deal of lost insight) or can find them in Strang [19] or many other linear algebra texts.

⁸If 0 is one of the M eigenvalues, then only M − 1 such matrices are required.