11: Monte Carlo- Areas and Volumes

Last updated
Save as PDF

Page ID: 48463

Masayuki Yano, James Douglass Penn, George Konidaris, & Anthony T Patera
Massachusetts Institute of Technology via MIT OpenCourseWare

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

We first review the "statistical process." We typically begin with some population we wish to characterize; we then draw a sample from this population; we then inspect the data - for example as a histogram - and postulate an underlying probability density (here taking advantage of the "frequency as probability" perspective); we then estimate the parameters of the density from the sample; and finally we are prepared to make inferences about the population. It is critical to note that in general we can "draw" from a population without knowing the underlying density; this in turn permits us to calibrate the postulated density.

We already observed one instance of this process with our coin flipping experiment. In this case, the population is all possible "behaviours" or flips of our coin; our sample is a finite number, \(n\), of coin flips; our underlying probability density is Bernoulli. We then estimate the Bernoulli parameter - the probability of heads, \(\theta\) - through our sample mean and associated (normalapproximation) confidence intervals. We are then prepared to make inferences: is the coin suitable to decide the opening moments of a football game? Note that in our experiments we effectively sample from a Bernoulli probability mass function with parameter \(\theta\) but without knowing the value of \(\theta\).

Bernoulli estimation is very important, and occurs in everything from coin flips to area and integral estimation (by Monte Carlo techniques as introduced in Chapter 12) to political and product preference polls. However, there are many other important probability mass functions and densities that arise often in the prediction or modeling of various natural and engineering phenomena. Perhaps premier among the densities is the normal, or Gaussian, density.

We have introduced the univariate normal density in Section 9.4. In this chapter, to avoid confusion with typical variables in our next unit, regression, we shall denote our normal random variable as \(W=W_{\mu, \sigma} \sim \mathcal{N}\left(\mu, \sigma^{2}\right)\) corresponding to probability density function \(f_{W}(w)=f^{\text {normal }}\left(w ; \mu, \sigma^{2}\right)\). We recall that the normal density is completely determined by the two parameters \(\mu\) and \(\sigma\) which are in fact the mean and the standard deviation, respectively, of the normal density.

The normal density is ubiquitous for several reasons. First, more pragmatically, it has some rather intuitive characteristics: it is symmetric about the mean, it takes its maximum (the mode) at the mean (which is also the median, by symmetry), and it is characterized by just two parameters - a center (mean) and a spread (standard deviation). Second, and more profoundly, the normal density often arises "due" to the central limit theorem, described in Section 9.4.3. In short (in fact, way too short), one form of the central limit theorem states that the average of many random perturbations - perhaps described by different underlying probability densities - approaches the normal density. Since the behavior of many natural and engineered systems can be viewed as the consequence of many random influences, the normal density is often encountered in practice.

As an intuitive example from biostatistics, we consider the height of US females (see L Winner notes on Applied Statistics, University of Florida, http: //www . stat . uf 1 . edu/ winner/statnotescomp/ appstat.pdf Chapter 2, p 26). In this case our population is US females of ages 25-34. Our sample might be the US Census data of 1992. The histogram appears quite normal-like, and we can thus postulate a normal density. We would next apply the estimation procedures described below to determine the mean and standard deviation (the two parameters associated with our "chosen" density). Finally, we can make inferences - go beyond the sample to the population as whole for example related to US females in 2012 .

The choice of population is important both in the sampling/estimation stage and of course also in the inference stage. And the generation of appropriate samples can also be a very thorny issue. There is an immense literature on these topics which goes well beyond our scope and also, to a certain extent - given our focus on engineered rather than social and biological systems beyond our immediate needs. As but one example, we would be remiss to apply the results from a population of US females to different demographics such as "females around the globe" or "US female jockeys" or indeed "all genders."

We should emphasize that the normal density is in almost all cases an approximation. For example, very rarely can a quantity take on all values however small or large, and in particular quantities must often be positive. Nevertheless, the normal density can remain a good approximation; for example if \(\mu-3 \sigma\) is positive, then negative values are effectively "never seen." We should also emphasize that there are many cases in which the normal density is not appropriate - not even a good approximation. As always, the data must enter into the decision as to how to model the phenomenon - what probability density with what parameters will be most effective?

As an engineering example closer to home, we now turn to the Infra-Red Range Finder distancevoltage data of Chapter 1 of Unit I. It can be motivated that in fact distance \(D\) and voltage \(V\) are inversely related, and hence it is plausible to assume that \(D V=C\), where \(C\) is a constant associated with our particular device. Of course, in actual practice, there will be measurement error, and we might thus plausibly assume that \[(D V)^{\text {meas }}=C+W\] where \(W\) is a normal random variable with density \(\mathcal{N}\left(0, \sigma^{2}\right)\). Note we assume that the noise is centered about zero but of unknown variance. From the transformation property of Chapter 4 , Example 9.4.5, we can further express our measurements as \[(D V)^{\text {meas }} \sim \mathcal{N}\left(C, \sigma^{2}\right)\] since if we add a constant to a zero-mean normal random variable we simply shift the mean. Note we now have a classical statistical estimation problem: determine the mean \(C\) and standard deviation \(\sigma\) of a normal density. (Note we had largely ignored noise in Unit I, though in fact in interpolation and differentiation noise is often present and even dominant; in such cases we prefer to "fit," as described in more detail in Unit III.)

In terms of the statistical process, our population is all possible outputs of our IR Range Finder device, our sample will be a finite number of distance-voltage measurements, \((D V)_{i}^{\text {meas }}, 1 \leq i \leq n\), our estimation procedure is presented below, and finally our inference will be future predictions of distance from voltage readings - through our simple relation \(D=C / V\). Of course, it will also be important to somehow justify or at least inspect our assumption that the noise is Gaussian. We now present the standard and very simple estimation procedure for the normal density. We present the method in terms of particular realization: the connection to probability (and random variables) is through the frequentist interpretation. We presume that \(W\) is a normal random variable with mean \(\mu\) and standard deviation \(\sigma\).

We first draw a sample of size \(n, w_{j}, 1 \leq j \leq n\), from \(f_{W}(w)=f^{\text {normal }}\left(w ; \mu, \sigma^{2}\right)\). We then calculate the sample mean as \[\bar{w}_{n}=\frac{1}{n} \sum_{j=1}^{n} w_{j}\] and the sample standard deviation as \[s_{n}=\sqrt{\frac{1}{n-1} \sum_{j=1}^{n}\left(w_{j}-\bar{w}_{n}\right)^{2}} .\] (Of course, the \(w_{j}, 1 \leq j \leq n\), are realizations of random variables \(W_{j}, 1 \leq j \leq n, \bar{w}_{n}\) is a realization of a random variable \(\bar{W}_{n}\), and \(s_{n}\) is a realization of a random variable \(S_{n}\).) Not surprisingly, \(\bar{w}_{n}\), which is simply the average of the data, is an estimate for the mean, \(\mu\), and \(s_{n}\), which is simply the standard deviation of the data, is an estimate for the standard deviation, \(\sigma\). (The \(n-1\) rather than \(n\) in the denominator of \(s_{n}\) is related to a particular choice of estimator and estimator properties; in any event, for \(n\) large, the difference is quite small.)

Finally, we calculate the confidence interval for the mean \[\left[\mathrm{ci}_{\mu ; n}=\left[\bar{w}_{n}-t_{\gamma, n-1} \frac{s_{n}}{\sqrt{n}}, \bar{w}_{n}+t_{\gamma, n-1} \frac{s_{n}}{\sqrt{n}}\right],\right.\] where \(\gamma\) is the confidence level and \(t_{\gamma, n-1}\) is related to the Student- \(t\) distribution. \({ }^{1}\) For the particular case of \(\gamma=0.95\) you can find values for \(t_{\gamma=0.95, n}\) for various \(n\) (sample sizes) in a table in Unit III. Note that for large \(n, t_{\gamma, n-1}\) approaches \(z_{\gamma}\) discussed earlier in the context of (normalapproximation) binomial confidence intervals.

We recall the meaning of this confidence interval. If we perform \(n_{\exp }\) realizations (with \(n_{\exp } \rightarrow\) \(\infty)\) - in which each realization corresponds to a (different) sample \(w_{1}, \ldots, w_{n}\), and hence different sample mean \(\bar{w}_{n}\), different sample standard deviation \(s_{n}\), and different confidence interval \([\text { ci }]_{\mu ; n}-\) then in a fraction \(\gamma\) of these realizations the true mean \(\mu\) will reside within the confidence interval. (Or course this statement is only completely rigorous if the underlying density is precisely the normal density.)

We can also translate our confidence interval into an "error bound" (with confidence level \(\gamma\) ). In particular, unfolding our confidence interval yields \[\left|\mu-\bar{w}_{n}\right| \leq t_{\gamma, n-1} \frac{s_{n}}{\sqrt{n}} \equiv \text { Half Length }{ }_{\mu ; n} .\] We observe the "same" square root of \(n\), sample size, that we observed in our Bernoulli estimation procedure, and in fact for the same reasons. Intuitively, say in our female height example, as we increase our sample size there are many more ways to obtain a sample mean close to \(\mu\) (with much cancellation about the mean) than to obtain a sample mean say \(\sigma\) above \(\mu\) (e.g., with all heights well above the mean). As you might expect, as \(\gamma\) increases, \(t_{\gamma, n-1}\) also increases: if we insist upon greater certainty in our claims, then we will lose some accuracy as reflected in the Half Length of the confidence interval.

\(1\) The multiplier \(t_{\gamma, n-1}\) satisfies \(F^{\text {student-t }}\left(t_{\gamma, n-1} ; n-1\right)=(\gamma+1) / 2\) where \(F^{\text {student-t }}(\cdot ; n-1)\) is the cdf of the Student’s- \(t\) distribution with \(n-1\) degrees of freedom; i.e. \(t_{\gamma, n-1}\) is the \((\gamma+1) / 2\) quantile of the Student’s- \(t\) distribution.