7.9: The Kolmogorov Inequalities

Last updated
Save as PDF

Page ID: 76996

Robert Gallager
Massachusetts Institute of Technology via MIT OpenCourseWare

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

We now use the previous theorems to establish Kolmogorov’s submartingale inequality, which is a major strengthening of the Markov inequality. Just as the Markov inequality in Section 1.7 was used to derive the Chebychev inequality and then the weak law of large numbers, the Kolmogorov submartingale inequality will be used to strengthen the Chebychev inequality, from which the strong law of large numbers will follow.

Theorem 7.9.1 (Kolmogorov’s submartingale inequality) Let {\(Z_n; n ≥ 1\)} be a nonnegative submartingale. Then for any positive integer \( m\) and any \(a > 0\),

\[ \text{Pr} \left\{ \max_{1\leq i\leq m} Z_i \geq a \right\} \leq \dfrac{\mathsf{E}[Z_m]}{a} \nonumber \]

Proof: Given a nonnegative submartingale {\(Z_n; n ≥ 1\)}, given \(a > 0\), and given a positive integer \(m\), let \(J\) be the stopping trial defined as the smallest \(n ≤ m\) such that \(Z_n ≥ a\). If \(Z_n < a\) for all \(n ≤ m\), then \(J = m\). Thus the process must stop by time \(m\), and \(Z_J ≥ a\) if and only if \(Z_n ≥ a\) for some \(n ≤ m\). Thus

\[ \text{Pr} \left\{ \max_{1\leq n\leq m} Z_n \geq a \right\} = \text{Pr}\{Z_J\geq a\} \leq \dfrac{\mathsf{E}[Z_J]}{a} \nonumber \]

where we have used the Markov inequality. Finally, since the process must be stopped by time \(m\), we have \(Z_J = Z^∗_m\). From (7.8.3), \(\mathsf{E} [Z^∗_m] ≤ \mathsf{E} [Z_m]\), so the right hand side of (7.9.2) is less than or equal to \(\mathsf{E} [Z_m] /a\), completing the proof.

\(\square \)

The following simple corollary shows that (7.9.1) takes a simpler form for nonnegative martingales.

Corollary 7.9.1 (Nonnegative martingale inequality) Let {\(Z_n; n ≥ 1\)} be a nonnegative martingale. Then

\[ \text{Pr}\left\{ \sup_{n\geq 1} Z_n \geq a \right\} \leq \dfrac{\mathsf{E}[Z_1]}{a}; \qquad for \, all \, a>0 \nonumber \]

Proof For a martingale, \(\mathsf{E} [Z_m] = \mathsf{E} [Z_1]\). Thus, from (7.9.1), \(\text{Pr}\{ \max_{1≤i≤m} Z_i ≥ a\}≤ \frac{\mathsf{E}[Z_1]}{a}\) for all \(m > 1\). Passing to the limit \(m →\infty\) essentially yields (7.9.3). Exercise 7.22 illustrates why the limiting operation here is a little tricky, and shows that it is valid.

\(\square \)

The following corollary bears the same relationship to the submartingale inequality as the Chebychev inequality does to the Markov inequality.

Corollary 7.9.2 (Kolmogorov’s martingale inequality) Let {\(Z_n; n ≥ 1\)} be a martingale with \(\mathsf{E}[Z^2_n] < \infty\) for all \(n ≥ 1\). Then

\[ \text{Pr} \left\{ \max_{1\leq n \leq m} |Z_n|\geq b \right\} \leq \dfrac{\mathsf{E}[Z^2_m]}{b^2} ; \quad for \, all \, integer \, m \geq 2, \, all \, b>0 \nonumber \]

Proof: Since {\(Z_n; n ≥ 1\)} is a martingale and \(Z^2_n\) is a convex function of \(Z_n\), it follows from Theorem 7.7.1 that {\(Z^2_n; n ≥ 1\)} is a submartingale. Since \(Z^2_n\) is nonnegative, we can use the Kolmogorov submartingale inequality to see that

\[ \text{Pr}\left\{ \max_{n\leq m} Z_n^2 \geq a \right\} \leq \mathsf{E} [Z_m^2]/a \quad \text{for any } a>0 \nonumber \]

Substituting \(b^2\) for \(a\), we get (7.9.4).

\(\square \)

Corollary 7.9.3 (Kolmogorov’s random walk inequality) Let {\(S_n; n ≥ 1\)} be a random walk with \(S_n = X_1 + ...+ X_n\) where {\(X_i; i ≥ i\)} is a set of IID random variables with mean \(\overline{X}\) and variance \(σ^2\). Then for any positive integer \(m\) and any \(\epsilon> 0\),

\[ \text{Pr} \left\{ \max_{1\leq n\leq m} |S_n-n\overline{X}] \geq m\epsilon \right\} \leq \dfrac{\sigma^2}{m\epsilon^2} \nonumber \]

Proof: {\(Z_n = S_n − n\overline{X}; n ≥ 1\)} is a zero mean random walk, and thus a martingale. Since \(\mathsf{E}[Z^2_m] = mσ^2\), (7.9.5) follows by substituting \(m\epsilon\) for \(b\) in (7.9.4).

\(\square \)

Recall that the simplest form of the weak law of large numbers was given in \ref{1.75} as \(\text{Pr}\{|S_m/m − \overline{X}|≥ \epsilon\} ≤ σ^2/(m\epsilon^2)\). This is strengthened in (7.9.5) to upper bound the probability that any of the first m terms deviate from the mean by more than \(m\epsilon\). It is this strengthening that will allow us to prove the strong law of large numbers assuming only a finite variance.

The following corollary yields essentially the same result as (7.56), but is included here as another example of the use of the Kolmogorov submartingale inequality.

Corollary 7.9.4. Let {\(S_n; n ≥ 1\)} be a random walk, \(S_n = X_1 +... + X_n\) where each \(X_i\) has mean \(\overline{X} < 0\) and semi-invariant moment generating function \(\gamma (r)\). For any \(r > 0\) such that \(0 < \gamma \ref{r} < \infty\) (\(i.e.\), for \(r > r^∗\)), and for any \(a > 0\).

\[ \text{Pr} \left\{ \max_{1\leq i \leq n} S_i \geq \alpha \right\} \leq \exp \{-r\alpha + n\gamma \ref{r} \} \nonumber \]

Proof: For \(r > r^∗\), {\(\exp (rS_n); n ≥ 1\)} is a submartingale. Taking \(a = \exp(rα)\) in (7.9.1), we get (7.9.6).

\(\square \)

The following theorem about supermartingales is, in a sense, the dual of the Kolmogorov submartingale inequality. Note, however, that it applies to the terms \(n ≥ m\) in the super-martingale rather than \(n ≤ m\).

Theorem 7.9.2. Let {\(Z_n; n ≥ 1\)} be a nonnegative supermartingale. Then for any positive integer \(m\) and any \(a > 0\),

\[ \text{Pr} \left\{ \bigcup_{i\geq m} \{Z_i\geq a\} \right\} \leq \dfrac{\mathsf{E}[Z_m]}{a} \nonumber \]

Proof: For given \(m ≥ 1\) and \(a > 0\), let \(J\) be a possibly-defective stopping trial defined as the smallest \(i ≥ m\) for which \(Z_i ≥ a\). Let {\(Z_n^∗; n ≥ 1\)} be the corresponding stopped process, which is also nonnegative and is a supermartingale from Theorem 7.8.1. For any \(k > m\), note that \(Z_k^∗ ≥ a\) iff \(\max_{m≤i≤k} Z_i ≥ a\). Thus

\[ \text{Pr} \left\{ \max_{m\leq i\leq k} Z_i\geq a \right\} = \text{Pr} \{Z_k^* \geq a\} \leq \dfrac{\mathsf{E}[Z^*_k]}{a} \nonumber \]

Since {\(Z_n^∗; n ≥ 1\)} is a supermartingale, \ref{7.76} shows that \(\mathsf{E} [Z^∗_k] ≤ \mathsf{E} [Z^∗_m]\). On the other hand, \(Z^∗_m= Z_m\) since the process can not stop before epoch \(m\). Thus \(\text{Pr} \{ \max_{m≤i≤k} Z_i ≥ a\}\) is at most \(\mathsf{E} [Z_m] /a\). Since \(k\) is arbitrary, we can pass to the limit, getting (7.9.7) and completing the proof.

strong law of large numbers (SLLN)

We now proceed to prove the strong law of large numbers assuming only a second moment. Recall that we proved the SLLN under the assumption of a finite fourth moment in Section 4.2.1. Here we use the Kolmogorov martingale inequality to show that only a second moment is required. The theorem is also true assuming only a first absolute moment, but the truncation argument we used for the weak law in Theorem 1.5.3 does not carry over simply here.

Theorem 7.9.3 (SLLN) Let {\(X_i; i ≥ 1\)} be a sequence of IID random variables with mean \(\overline{X}\) and standard deviation \(σ< \infty\). Let \(S_n = X_1 + ··· + X_n\). Then for any \(\epsilon > 0\),

\[ \begin{align} \text{Pr} \left\{ \lim_{n\rightarrow \infty} \dfrac{S_n}{n}= \overline{X} \right\} \quad &= \quad 1 \quad \text{and} \\ \lim_{n\rightarrow \infty} \text{Pr} \left\{ \bigcup_{m>n} \left| \dfrac{S_m}{m}-\overline{X} \right| > \epsilon \right\} \quad &= \quad 0 \end{align} \nonumber \]

Proof: From Section 4.2.2, (7.9.8) and(7.9.9)are equivalent, so we establish (7.9.9). As
\(n\) increases, successive terms are droppped out of the union above, so the probability is non-increasing with \(n\). Thus we can restrict attention to \(n\) of the form \(2^k\) for integer \(k\). For any given \(k\), the union above can be separated into blocks as follows:

\[ \begin{align} \text{Pr} \left\{ \bigcup_{m>2^k} \left\{ \left| \dfrac{S_m}{m}-\overline{X} \right| >\epsilon \right\} \right\} \quad &= \quad \text{Pr} \left\{ \bigcup^{\infty}_{j=k} \bigcup^{2^{j+1}}_{m=2^j+1} \left\{ \left| \dfrac{S_m}{m} - \overline{X} \right| > \epsilon \right\} \right\} \nonumber \\ &\leq \quad \sum^{\infty}_{j=k}\text{Pr} \left\{ \bigcup^{2^{j+1}}_{m=2^j+1} \left\{ \left| \dfrac{S_m}{m} - \overline{X} \right| > \epsilon \right\} \right\}\\ &= \quad \sum^{\infty}_{j=k}\text{Pr} \left\{ \bigcup^{2^{j+1}}_{m=2^j+1} \{ | S_m-m \overline{X} | > \epsilon m \} \right\} \nonumber \\ &\leq \quad \sum^{\infty}_{j=k} \text{Pr} \left\{ \bigcup^{2^{j+1}}_{m=2^j+1} \{ |S_m-m\overline{X} | > \epsilon 2^j \} \right\} \\ &= \quad \sum^{\infty}_{j=k} \text{Pr} \left\{ \max_{2^j+1\leq m \leq 2^{j+1}}|S_m - m \overline{X}| > \epsilon 2^j \right\} \nonumber \\ &\leq \quad \sum^{\infty}_{j=k} \text{Pr} \left\{ \max_{1\leq m \leq 2^{j+1}}|S_m-m\overline{X}| > \epsilon 2^j \right\} \\ &\leq \quad \sum^{\infty}_{j=k} \dfrac{2^{j+1}\sigma^2}{\epsilon^22^{2j}} \quad = \quad \dfrac{2^{-k+2}\sigma^2}{\epsilon^2} \end{align} \nonumber \]

In (7.9.10), we used the union bound on the union over \(j\). In (7.9.11), we used the fact that \(m ≥ 2^j\) to increase the size of the sets in the union. In (7.9.12), we upper bounded by adding additional terms into the maximization, and in (7.9.13) we used the Kolmogorov martingale inequality. The proof is completed by noting that the upper bound in (7.9.13) goes to 0 with increasing \(k\).

\(\square \)

It should be noted that, although the proof consists of a large number of steps, the steps are all quite small and familiar. Basically the proof is a slick way of upper bounding the probability of the high-order terms in a sequence by using the Kolmogorov martingale inequality, which bounds the low-order terms.

martingale convergence theorem

Another famous result that follows from the Kolmogorov submartingale inequality is the martingale convergence theorem. This states that if a martingale {\(Z_n; n ≥ 1\)} has the property that there is some finite \(M\) such that \(\mathsf{E} [|Z_n|] ≤ M\) for all \( n\), then \(\lim_{n→\infty} Z_n\) exists (and is finite) with probability 1. This is a powerful theorem in more advanced work, but it is not quite as useful as it appears, since the restriction \(\mathsf{E} [|Z_n|] ≤ M\) is more than a technical restriction; for example it is not satisfied by a zero-mean random walk. We prove the theorem with the additional restriction that there is some finite \(M\) such that \(\mathsf{E} [Z^2_n] ≤ M\) for all \( n\).

Theorem 7.9.4 (Martingale convergence theorem) Let {\(Z_n ; n ≥ 1\)} be a martingale and assume that there is some finite \(M\) such that \(\mathsf{E} [Z^2_n]\leq M\) for all \(n\). Then there is a random variable \(Z\) such that, for all sample sequences except a set of probability 0, \(\lim_{n→\infty} Z_n = Z\).

Proof^∗: From Theorem 7.7.1 and the assumption that \(\mathsf{E} [Z_n^2] ≤ M\), {\(Z_n^2; n ≥ 1\)} is a submartingale. Thus, from (7.75), \(\mathsf{E} [Z^2_n]\) is nondecreasing in \(n\), and since \(\mathsf{E} [Z^2_n]\) is bounded, \(\lim_{n→\infty} \mathsf{E} [Z_n^2] = M'\) for some \(M' ≤ M\). For any integer \(k\), the process {\(Y_n = Z_{k+n} − Z_k; n ≥ 1\)} is a zero mean martingale (see Exercise 7.29). Thus from Kolmogorov’s martingale inequality,

\[ \text{Pr} \left\{ \max_{1\leq n\leq m} |Z_{k+n}-Z_k |\geq b \right\} \leq \mathsf{E}[(Z_{k+m}-Z_k)^2]/b^2 \nonumber \]

Next, observe that \(\mathsf{E} [Z_{k+m}Z_k | Z_k = z_k, Z_{k−1} = z_{k−1}, . . . , Z_1 = z_1] = z_k^2\), and therefore, \(\mathsf{E} [Z_{k+m}Z_k] = \mathsf{E} [Z_k^2]\). Thus \(\mathsf{E} [(Z_{k+m} − Z_k)^2] = \mathsf{E} [Z^2_{k+m}] − \mathsf{E} [Z_k^2] ≤ M' − \mathsf{E} [Z_k^2]\). Since this is independent of \(m\), we can pass to the limit, obtaining

\[ \text{Pr} \left\{ \sup_{n\geq 1} |Z_{k+n}-Z_k|\geq b \right\} \leq \dfrac{M'-\mathsf{E}[Z^2_k]}{b^2} \nonumber \]

Since \(\lim_{k→\infty} \mathsf{E} [Z^2_k] = M'\), we then have, for all \(b > 0\),

\[ \lim_{k\rightarrow \infty} \text{Pr} \left\{ \sup_{n\geq 1} |Z_{k+n}-Z_k| \geq b \right\} =0 \nonumber \]

This means that with probability 1, a sample sequence of {\(Z_n; n ≥ 1\)} is a Cauchy sequence, and thus approaches a limit, concluding the proof.

\(\square \)

This result can be relatively easily interpreted for branching processes. For a branching process {\(X_n; n ≥ 1\)} where \(\overline{Y}\) is the expected number of offspring of an individual, {\(X_n/\overline{Y}^n; n ≥ 1\)} is a martingale that satisfies the above conditions. If \(\overline{Y} ≤ 1\), the branching process dies out with probability 1, so \(X_n/\overline{Y}^n\) approaches 0 with probability 1. For \(\overline{Y} > 1\), however, the branching process dies out with some probability less than 1 and approaches \(\infty \) otherwise. Thus, the limiting random variable \(Z\) is 0 with the probability that the process ultimately dies out, and is positive otherwise. In the latter case, for large \(n\), the interpretation is that when the population is very large, a law of large numbers effect controls its growth in each successive generation, so that \(X_n/\overline{Y}^n\) tends to change in a random way for small \(n\), and then changes increasingly little as \(n\) increases.