7.6: Martingales

Last updated
Save as PDF

Page ID: 76992

Robert Gallager
Massachusetts Institute of Technology via MIT OpenCourseWare

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

A martingale is defined as an integer-time stochastic process {\(Z_n; n ≥ 1\)} with the properties that \(\mathsf{E} [|Zn|] < \infty \) for all \(n ≥ 1\) and

\[ \mathsf{E}[Z_n|Z_{n-1},Z_{n-2},...,Z_1]=Z_{n-1}; \quad \text{for all } n\geq 2 \nonumber \]

The name martingale comes from gambling terminology where martingales refer to gambling strategies in which the amount to be bet is determined by the past history of winning or losing. If one visualizes \(Z_n\) as representing the gambler’s fortune at the end of the \(n^{th}\) play, the definition above means, first, that the game is fair (in the sense that the expected increase in fortune from play \(n − 1\) to \(n\) is zero), and, second, that the expected fortune on the \(n^{th}\) play depends on the past only through the fortune on play \(n − 1\).

The important part of the definition of a martingale, and what distinguishes martingales from other kinds of processes, is the form of dependence in (7.6.1). However, the restriction that \( \mathsf{E} [|Z_n|] < \infty \) is also important, particularly since martingales are so abstract and general that one often loses the insight to understand intuitively when this restriction is important. Students are advised to ignore this restriction when first looking at something that might be a martingale, and to check later after acquiring some understanding.

There are two interpretations of (7.6.1); the first and most straightforward is to view it as shorthand for \( \mathsf{E} [Z_n |Z_{n−1} =z_{n−1},\, Z_{n−2} =z_{n−2}, . . . , Z_1 =z_1] = z_{n−1} \) for all possible sample values \(z_1, z_2, . . . , z_{n−1}\). The second is that \(\mathsf{E} [Z_n| Z_{n−1} =z_{n−1}, . . . , Z_1 =z_1]\) is a function of the sample values \(z_1, . . . , z_{n−1}\) and thus \(\mathsf{E} [Z_n| Z_{n−1}, . . . , Z_1]\) is a random variable which is a function of the random variables \(Z_1, . . . , Z_{n−1}\) (and, for a martingale, a function only of \(Z_{n−1}\)). Students are encouraged to take the first viewpoint initially and to write out the expanded type of expression in cases of confusion. The second viewpoint, however, is very powerful, and, with experience, is the more useful viewpoint.

It is important to understand the difference between martingales and Markov chains. For the Markov chain {\(X_n; n ≥ 1\)}, each rv \(X_n\) is conditioned on the past only through \(X_{n−1}\), whereas for the martingale {\(Z_n; n ≥ 1\)}, it is only the expected value of \(Z_n\) that is conditioned on the past only through \(Z_{n−1}\). The rv \(Z_n\) itself, conditioned on \(Z_{n−1}\), can also be dependent on all the earlier \(Z_i\)’s. It is very surprising that so many results can be developed using such a weak form of conditioning

In what follows, we give a number of important examples of martingales, then develop some results about martingales, and then discuss those results in the context of the examples.

Simple examples of martingales

Example 7.6.1 (Random walks) One example of a martingale is a zero-mean random walk, since if \(Z_n = X_1 + X_2 +... + X_n\), where the \(X_i\) are IID and zero mean, then

\[ \begin{align} \mathsf{E} [ Z_n|Z_{n-1}, ..., Z_1] \quad &= \quad \mathsf{E} [X_n +Z_{n-1}|Z_{n-1},...,Z_1 ] \\ &= \quad \mathsf{E} [X_n] +Z_{n-1} \quad = \quad Z_{n-1} \end{align} \nonumber \]

Extending this example, suppose that {\(X_i; i ≥ 1\)} is an arbitrary sequence of IID random variables with mean \(\overline{X} \) and let \(\tilde{X}_i = X_i − \overline{X}\). Then {\(S_n; n ≥ 1\)} is a random walk with \(S_n = X_1 +... + X_n\) and {\(Z_n; n ≥ 1\)} is a martingale with \(Z_n = \tilde{X}_1 +... + \tilde{X}_n\). The random walk and the martingale are simply related by \(Z_n = S_n − n\overline{X}\), and thus general results about martingales can easily be applied to arbitrary random walks.

Example 7.6.2 (Sums of dependent zero-mean variables) Let {\(X_i; i ≥ 1\)} be a set of dependent random variables satisfying \(\mathsf{E} [X_i | X_{i−1}, . . . , X_1] = 0\). Then {\(Z_n; n ≥ 1\)}, where \(Z_n = X_1 +... + X_n\), is a zero mean martingale. To see this, note that

\[ \begin{aligned} \mathsf{E}[Z_n|Z_{n-1},..., Z_1] \quad &= \quad \mathsf{E}[X_n+Z_{n-1}|Z_{n-1},...,Z_1 ] \\ &= \quad \mathsf{E}[X_n|X_{n-1},...,X_1] +\mathsf{E} [ Z_{n-1}|Z_{n-1},..., Z_1] = Z_{n-1} \end{aligned} \nonumber \]

This is a more general example than it appears, since given any martingale {\(Z_n; n ≥ 1\)}, we can define \(X_n = Z_n − Z_{n−1}\) for \(n ≥ 2\) and define \(X_1 = Z_1\). Then \(\mathsf{E} [X_n | X_{n−1}, . . . , X_1] = 0\) for \(n ≥ 2\). If the martingale is zero mean (\(i.e.\), if \(\mathsf{E} [Z_1] = 0\)), then \(\mathsf{E} [X_1] = 0\) also.

Example 7.6.3 (Product-form martingales) Another example is a product of unit mean IID random variables. Thus if \(Z_n = X_1X_2 . . . X_n\), we have

\[ \begin{align} \mathsf{E}[Z_n|Z_{n-1},...,Z_1] \quad &= \quad \mathsf{E}[X_n,Z_{n-1} | Z_{n-1},...,Z_1 ] \nonumber \\ &= \quad \mathsf{E} [ X_n] \mathsf{E} [Z_{n-1}| Z_{n-1},...,Z_1 ] \\ &= \quad \mathsf{E} [X_n] \mathsf{E} [ Z_{n-1} | Z_{n-1} ] = Z_{n-1} \end{align} \nonumber \]

A particularly simple case of this product example is where \( X_n = 2\) with probability 1/2 and \(X_n = 0\) with probability 1/2. Then

\[ \text{Pr} \{Z_n=2^n \} = 2^{-n} ; \qquad \text{Pr}\{ Z_n = 0\} = 1-2^{-n}; \qquad \mathsf{E}[Z_n] = 1 \nonumber \]

Thus \( \lim_{n→ \infty } Z_n = 0\) with probability 1, but \(\mathsf{E} [Z_n] = 1\) for all \(n\) and \(\lim_{n→\infty} \mathsf{E} [Z_n] = 1\). This is an important example to keep in mind when trying to understand why proofs about martingales are necessary and non-trivial. This type of phenomenon will be clarified somewhat by Lemma 7.8.1 when we discussed stopped martingales in Section 7.8.

An important example of a product-form martingale is as follows: let {\(X_i; i ≥ 1\)} be an IID sequence, and let {\(S_n = X_1 +... + X_n; n ≥ 1\)} be a random walk. Assume that the semi-invariant generating function \(\gamma \ref{r} = \ln \{\mathsf{E} [\exp (rX)]\}\) exists in some region of \(r\) around 0. For each \(n ≥ 1\), let \(Z_n\) be defined as

\begin{align} Z_n \quad &= \quad \exp \{ rS_n-n\gamma(r) \} \\ &= \quad \exp \{ rX_n-\gamma \ref{r} \} \exp \{ rS_{n-1} - (n-1) \gamma \ref{r} \} \nonumber \\ &= \quad \exp \{ rX_n-\gamma (r)\} Z_{n-1} \end{align}

Taking the conditional expectation of this,

\[ \begin{align} \mathsf{E} [Z_n |Z_{n-1},..., Z_1] \quad &= \quad \mathsf{E} [\exp (rX_n-\gamma (r))] \mathsf{E} [Z_{n-1} | Z_{n-1},...,Z_1 ] \nonumber \\ &= \quad Z_{n-1} \end{align} \nonumber \]

where we have used the fact that \(\mathsf{E} [\exp (rX_n)] = \exp (\gamma (r))\). Thus we see that {\(Z_n; n ≥ 1\)}is a martingale of the product-form.

Scaled branching processes

A final simple example of a martingale is a “scaled down” version of a branching process {\(X_n; n ≥ 0\)}. Recall from Section 5.5 that, for each \(n\), \(X_n\) is defined as the aggregate number of elements in generation \(n\). Each element \(i\) of generation \(n\), \(1 ≤ i ≤ X_n\) has a number \(Y_{i,n}\) of offspring which collectively constitute generation \(n + 1,\, i.e.,\, X_{n+1} = \sum^{X_n}_{i=1} Y_{i,n}\). The rv’s \(Y_{i,n}\) are IID over both \(i\) and \(n\).

Let \(\overline{Y} = \mathsf{E} [Y_{i,n}] \) be the mean number of offspring of each element of the population. Then \(\mathsf{E} [X_n | X_{n−1}] = \overline{Y} X_{n−1}\), which resembles a martingale except for the factor of \(\overline{Y}\). We can convert this branching process into a martingale by scaling it, however. That is, define \(Z_n = X_n/\overline{Y}^n\). It follows that

\[ \mathsf{E} [Z_n |Z_{n-1} , ...,Z_1 ] = \mathsf{E} \left[ \dfrac{X_n}{\overline{Y}^n} | X_{n-1},...,X_1 \right] = \dfrac{\overline{Y}X_{n-1}}{\overline{Y}^n} = Z_{n-1} \nonumber \]

Thus {\(Z_n; n ≥ 1\)} is a martingale. We will see the surprising result later that this implies that \(Z_n\) converges with probability 1 to a limiting rv as \(n →\infty\).

Partial isolation of past and future in martingales

Recall that for a Markov chain, the states at all times greater than a given \(n\) are independent of the states at all times less than \(n\), conditional on the state at time \(n\). The following lemma shows that at least a small part of this independence of past and future applies to martingales.

Lemma 7.6.1. Let {\(Z_n; n ≥ 1\)} be a martingale. Then for any \(n > i ≥ 1\),

\[ \mathsf{E} [Z_n|Z_i,Z_{i-1},...,Z_1 ] = Z_i \nonumber \]

Proof: For \( n = i + 1\), \( \mathsf{E} [Z_{i+1} | Z_i, . . . , Z_1] = Z_i\) by the definition of a martingale. Now consider \(n = i + 2\). Then \(\mathsf{E} [Z_{i+2} | Z_{i+1}, . . . , Z_1]\) is a rv equal to \(Z_{i+1}\). We can view \(\mathsf{E} [Z_{i+2} | Z_i, . . . , Z_1]\) as the conditional expectation of the rv \(\mathsf{E} [Z_{i+2} | Z_{i+1}, . . . , Z_1]\) over \(Z_{i+1}\), conditional on \(Z_i, . . . , Z_1\).

\[ \begin{align} \mathsf{E} [ Z_{i+2}| Z_i,...,Z_1] \quad &= \quad \mathsf{E} [\mathsf{E}[Z_{i+2}|Z_{i+1},Z_i,...,Z_1]|Z_i,...,Z_1] \nonumber \\ &= \quad \mathsf{E}[Z_{i+1}|Z_i,...,Z_1] = Z_i \end{align} \nonumber \]

For \(n = i+3\), (7.6.12), with \(i\) incremented, shows us that the rv \(\mathsf{E} [Z_{i+3} | Z_{i+1}, . . . , Z_1] = Z_{i+1}\). Taking the conditional expectation of this rv over \(Z_{i+1}\) conditional on \(Z_i, . . . , Z_1\), we get

\[ \mathsf{E} [ Z_{i+3}|Z_i, ..., Z_1 ] = Z_i \nonumber \]

This argument can be applied successively to any \(n > i\).

\(\square\)

This lemma is particularly important for \( i = 1\), where it says that \(\mathsf{E} [Z_n | Z_{n−1}, . . . , Z_1] = Z_1\). The left side of this is a rv which is a function (in fact the identity function) of \(Z_1\). Thus, by taking the expected value of each side, we see that

\[ \mathsf{E} [Z_n] = \mathsf{E}[Z_1] \qquad \text{for all } n>1 \nonumber \]

An important application of this is to the product form martingale in (7.6.9). This says that

\[\begin{align} \mathsf{E}[\exp (rS_n-n\gamma(r)] \quad &= \quad \mathsf{E}[\exp(rX-\gamma(r))] \nonumber \\ &= \quad \mathsf{E} [ \exp (rX)]/g(r) =1 \end{align} \nonumber \]

We will come back later to relate this to Wald’s identity.

Search

Text Color

Text Size

Margin Size

Font Type