8.2: Monte Carlo Simulation

Last updated
Save as PDF

Page ID: 47266

Franz S. Hover & Michael S. Triantafyllou
Massachusetts Institute of Technology via MIT OpenCourseWare

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

Suppose that we make \(N\) simulations, each time drawing the needed random parameters \(x_i\) from a random number ”black box” (about which we will give more details in the next section). We define the high-level output of our system \(S\) to be \(g(\vec{x})\). For simplicity, we say that \(g(\vec{x})\) is a scalar. \(g(\vec{x})\) can be virtually any output of interest, for example: the value of one state at a given time after an impulsive input, or the integral over time of the trajectory of one of the outputs, with a given input. In what follows, we will drop the vector notation on \(x\) for clarity.

Let the estimator \(G\) of \(g(x)\) be defined as

\[ G = \dfrac{1}{N} \sum_{j=1}^{N} g(x_j). \]

You may recognize this as a straight average. Indeed, taking the expectation on both sides,

\[ E(G) = \dfrac{1}{N} \sum_{j=1}^{N} E(g(x_i)), \]

it is clear that \(E(G) = E(g)\). At the same time, however, we do not know \(E(g)\); we calculate \(G\) understanding that with a very large number of trials, \(G\) should approach \(E(g)\). Now let’s look at the variance of the estimator. This conceptually results from an infinite number of estimator trials, each one of which involves \(N\) evaluations of \(g\) according to the above definition. It is important to keep in mind that such a variance involves samples of the estimator (each involving \(N\) evaluations) - not the underlying function \(g(x)\). We have

\begin{align} \sigma^2 (G) \, &= \, \sigma^2 \left[ \dfrac{1}{N} \sum_{j=1}^N g(x_j) \right] \\[4pt] &= \, \dfrac{1}{N^2} \sigma^2 \left[ \sum_{j=1}^N g(x_j) \right] \\[4pt] &= \, \dfrac{1}{N^2} \sum_{j=1}^N \sigma^2 (g) \\[4pt] &= \, \dfrac{1}{N} \sigma^2 (g). \end{align}

This relation is key. The first equality follows from the fact that \(\sigma^2 (nx) = n^2 \sigma^2 (x)\), if \(n\) is a constant. The second equality is true because \(\sigma^2 (x+y) = \sigma^2 (x) + \sigma^2 (y)\), where \(x\) and \(y\) are random variables. The major result is that \(\sigma^2 (G) = \sigma^2 (g)\) if only one-sample trials are considered, but that \(\sigma^2 (G) \rightarrow 0\) as \(N \rightarrow \infty\). Hence, with a large enough \(N\) we can indeed expect that our \(G\) will be very close to \(E(g)\).

Let us take this a bit further, to get an explicit estimate for the error in \(G\) as we go to large \(N\). Define a nondimensional estimator error

\begin{align} q \, &= \, \dfrac{G - E(g)}{\sigma (G)} \\[4pt] &= \, \dfrac{(G - E(g)) \sqrt{N}}{\sigma (g)}, \end{align}

where the second equality comes from the result above. We call the factor \(\sigma(g) / \sqrt{N}\) the standard error. Invoking the central limit theorem, which guarantees that the distribution of G becomes Gaussian for large enough N, we have

\begin{align} \lim_{N \to \infty} \textrm{prob} (a<q<b) \, &= \, \int\limits_{a}^{b} \dfrac{1}{\sqrt {2 \pi}} e^{-t^2 / 2} \, dt \\[4pt] &= \, F(a) - F(b), \end{align}

where \(F(x)\)is the cumulative probability function of the standard Gaussian variable:

\[ F(a) = \int\limits_{- \infty}^{a} \dfrac{1}{\sqrt{2 \pi}} e^{-t^2 / 2} \, dt \]

Looking up some values for \(F(x)\), we see that the nondimensional error is less than one in 68.3% of trials; it is less than two in 95.4% of trials, and less than three in 99.7% of trials. The 99.7% confidence interval corresponds with

\begin{align} -3 \, &\leq \, (G - E(g)) \sqrt{N} / \sigma (g) \, \leq \, 3 \, \rightarrow \\[4pt] -3 \sigma (g) / \sqrt{N} \, &\leq \, \quad \quad \,\, G - E(g) \,\, \ \quad \quad \, \leq \, 3 \sigma (g) / \sqrt{N}. \end{align}

In general, quadrupling the number of trials improves the error by a factor of two.

So far we have been describing a single estimator \(G\), which recovers the mean. The mean, however, is in fact an integral over the random domain:

\[ E(g) \, = \int\limits_{x \epsilon X} p(x) g(x) \, dx, \]

where \(p(x)\) is the pdf of random variable \(x\). So the Monte Carlo estimator \(G\) is in fact an integrator:

\[ G \, \simeq \int\limits_{x \epsilon X} p(x) g(x) \, dx. \]

We can just as easily define estimators of statistical moments:

\[ G_n \, = \, \dfrac{1}{N} \sum_{j=1}^{N} x_j^n g(x_j) \, \simeq \int\limits_{x \exists X} x^n p(x) g(x) \, dx, \]which will follow the same basic convergence trends of the mean estimator \(G\). These moments can be calculated all using the same \(N\) evaluations of \(g(x)\).

The above equation gives another point of view to understand how the Monte Carlo approach works: the effect of the probability density function in the integral is replaced by the fact that random variables in MC are drawn from the same distribution. In other words, a high \(p(x)\) in a given area of the domain \(X\) amplifies \(g(x)\) there. MC does the same thing, because there are in fact more \(x\) drawn from this area, in making the \(N\) evaluations.