14.1: Calculating a Failure Probability

Last updated
Save as PDF

Page ID: 55677

Masayuki Yano, James Douglass Penn, George Konidaris, & Anthony T Patera
Massachusetts Institute of Technology via MIT OpenCourseWare

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

Objective

Let’s say there is a set of "environmental" or "load" variables \(\left(x_{1}, x_{2}, \ldots\right)\) that affect the performance of an engineering system. For simplicity, let us restrict ourselves to the parameter size of two, so that we only have \(\left(x_{1}, x_{2}\right)\). We also assume that there are two "performance" metrics, \(g_{1}\left(x_{1}, x_{2}\right)\) and \(g_{2}\left(x_{1}, x_{2}\right)\). Without loss of generality, let’s assume smaller \(g_{1}\) and \(g_{2}\) means better performance (we can always consider negative of the performance variable if larger values imply better performance). In fact, we assume that we wish to confirm that the performance metrics are below certain thresholds, i.e. \[g_{1}\left(x_{1}, x_{2}\right) \leq \tau_{1} \quad \text { and } \quad g_{2}\left(x_{1}, x_{2}\right) \leq \tau_{2} .\] Equivalently, we wish to avoid failure, which is defined as \[g_{1}\left(x_{1}, x_{2}\right)>\tau_{1} \quad \text { or } \quad g_{2}\left(x_{1}, x_{2}\right)>\tau_{2} .\] Note that in this chapter failure is interpreted liberally as the condition (14.1) even if this condition is not equivalent in any given situation as actual failure.

Suppose that \(\left(x_{1}, x_{2}\right)\) reside in some rectangle \(R\). We now choose to interpret \(\left(x_{1}, x_{2}\right)\) as realizations of a random vector \(X=\left(X_{1}, X_{2}\right)\) with prescribed probability density function \(f_{X}\left(x_{1}, x_{2}\right)=\) \(f_{X_{1}, X_{2}}\left(x_{1}, x_{2}\right)\). We then wish to quantify the failure probability \(\theta_{F}\), defined by \[\theta_{F}=P\left(g_{1}\left(X_{1}, X_{2}\right)>\tau_{1} \text { or } g_{2}\left(X_{1}, X_{2}\right)>\tau_{2}\right) .\] We note that \(g_{1}\) and \(g_{2}\) are deterministic functions; however, because the argument to the functions are random variables, the output \(g_{1}\left(X_{1}, X_{2}\right)\) and \(g_{2}\left(X_{1}, X_{2}\right)\) are random variables. Thus, the failure is described probabilistically. If the bounds on the environmental variables \(\left(x_{1}, x_{2}\right)\) are known \(a\) priori one could design a system to handle the worst possible cases; however, the system design to handle very rare events may be over designed. Thus, a probabilistic approach may be appropriate in many engineering scenarios.

In any probabilistic simulation, we must make sure that the probability density of the random variable, \(f_{X}\), is meaningful and that the interpretation of the probabilistic statement is relevant. For example, in constructing the distribution, a good estimate may be obtained from statistical data (i.e. by sampling a population). The failure probability \(\theta_{F}\) can be interpreted as either

(\(i\)) probability of failure for the next "random" set of environmental or operating conditions, or

(\(ii\)) frequency of failure over a population (based on the frequentist perspective).

Integral

We now show that the computation of failure probability is similar to computation of an area. Let us define \(R\) to be the region from which \(X=\left(X_{1}, X_{2}\right)\) is sampled (not necessarily uniformly). In other words, \(R\) encompasses all possible values that the environmental variable \(X\) can take. Let us also define \(D\) to be the region whose element \(\left(x_{1}, x_{2}\right) \in D\) would lead to failure, i.e. \[D \equiv\left\{\left(x_{1}, x_{2}\right): g_{1}\left(x_{1}, x_{2}\right)>\tau_{1} \quad \text { or } \quad g_{2}\left(x_{1}, x_{2}\right)>\tau_{2}\right\} .\] Then, the failure probability can be expressed as an integral \[\theta_{F}=\iint_{D} f_{X}\left(x_{1}, x_{2}\right) d x_{1} d x_{2} .\] This requires a integration over the region \(D\), which can be complicated depending on the failure criteria.

However, we can simplify the integral using the technique previously used to compute the area. Namely, we introduce a failure indicator or characteristic function, \[\mathbf{1}_{F}\left(x_{1}, x_{2}\right)=\left\{\begin{array}{llll} 1, & g_{1}\left(x_{1}, x_{2}\right)>\tau_{1} & \text { or } & g_{2}\left(x_{1}, x_{2}\right)>\tau_{2} \\ 0, & \text { otherwise } \end{array} .\right.\] Using the failure indicator, we can write the integral over \(D\) as an integral over the simpler domain \(R\), i.e. \[\theta_{F}=\iint_{R} \mathbf{1}\left(x_{1}, x_{2}\right) f_{X}\left(x_{1}, x_{2}\right) d x_{1} d x_{2} .\] Note that Monte Carlo methods can be used to evaluate any integral in any number of dimensions. The two main approaches are "hit or miss" and "sample mean," with the latter more efficient. Our case here is a natural example of the sample mean approach, though it also has the flavor of "hit or miss." In practice, variance reduction techniques are often applied to improve the convergence.

Monte Carlo Approach

We can easily develop a Monte Carlo approach if we can reduce our problem to a Bernoulli random variable with parameter \(\theta_{F}\) such that \[B=\left\{\begin{array}{ll} 1, & \text { with probability } \theta_{F} \\ 0, & \text { with probability } 1-\theta_{F} \end{array} .\right.\] Then, the computation of the failure probability \(\theta_{F}\) becomes the estimation of parameter \(\theta_{F}\) through sampling (as in the coin flip example).

Determination of \(B\) is easy assuming we can evaluate \(g_{1}\left(x_{1}, x_{2}\right)\) and \(g_{2}\left(x_{1}, x_{2}\right)\). But, by definition \[\begin{aligned} \theta_{F} &=P\left(g_{1}(X)>\tau_{1} \quad \text { or } \quad g_{2}(X)>\tau_{2}\right) \\ &=\iint_{R} \mathbf{1}_{F}\left(x_{1}, x_{2}\right) f_{X}\left(x_{1}, x_{2}\right) d x_{1} d x_{2} . \end{aligned}\]

Hence we have identified a Bernoulli random variable with the requisite parameter \(\theta_{F}\).

The Monte Carlo procedure is simple. First, we draw \(n_{\max }\) random variables, \[\left(X_{1}, X_{2}\right)_{1},\left(X_{1}, X_{2}\right)_{2}, \ldots,\left(X_{1}, X_{2}\right)_{n}, \ldots,\left(X_{1}, X_{2}\right)_{n_{\max }},\] and map them to Bernoulli random variables \[\left(X_{1}, X_{2}\right)_{n} \rightarrow B_{n} \quad n=1, \ldots, n_{\max },\] according to \((\underline{14.2})\). Using this mapping, we assign sample means, \(\widehat{\Theta}_{n}\), and confidence intervals, \(\left[\mathrm{CI}_{F}\right]_{n}\), according to \[\begin{aligned} &\left(\widehat{\Theta}_{F}\right)_{n}=\frac{1}{n} \sum_{i=1}^{n} B_{i}, \\ &{\left[\mathrm{CI}_{F}\right]_{n}=\left[\left(\widehat{\Theta}_{F}\right)_{n}-z_{\gamma} \sqrt{\frac{\left(\widehat{\Theta}_{F}\right)_{n}\left(1-\left(\widehat{\Theta}_{F}\right)_{n}\right)}{n}},\left(\widehat{\Theta}_{F}\right)_{n}+z_{\gamma} \sqrt{\frac{\left(\widehat{\Theta}_{F}\right)_{n}\left(1-\left(\widehat{\Theta}_{F}\right)_{n}\right)}{n}}\right] .} \end{aligned}\]

Note that in cases of failure, typically we would like \(\theta_{F}\) to be very small. We recall from Section 10.3.3 that it is precisely this case for which the relative error RelErr is quite large (and furthermore for which the normal density confidence interval is only valid for quite large \(n\) ). Hence, in practice, we must consider very large sample sizes in order to obtain relatively accurate results with reasonable confidence. More sophisticated approaches partially address these issues, but even these advanced approaches often rely on basic Monte Carlo ingredients.

Finally, we note that although the above description is for the cumulative approach we can also directly apply equations \(14.3\) and \(14.4\) for any fixed \(n\). In this case we obtain \(\operatorname{Pr}\left(\theta_{F} \in\left[\mathrm{CI}_{f}\right]_{n}\right)=\gamma\).