4.7: Comparing Scenarios

Last updated
Save as PDF

Page ID: 30973

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

This section presents a strategy for determining if simulation results provide evidence that one scenario is better than another. Often one scenario represents current system operations for an existing system or a baseline design for a proposed system. Improvements to the current operations or to a baseline design are proposed. Simulation results are used see if these improvements are significant or not. In addition, it may be necessary to compare one proposed improvement to another. This is an important part of step 3 Identify Root Causes and Assess Initial Scenarios as well as step 4 Review and Extend Previous Work of the simulation project process.

Often, pair-wise comparisons are made. This will be the scope of our discussion. Law (2007) provides a summary of methods for ranking and selecting the best from among all of the scenarios that are considered.

The job of comparing scenario A to scenario B is an effort to find evidence that scenario A is better than scenario B. This evidence is found first by examining observations of performance measures to see if any operationally significant differences or unexpected differences can be seen. If such differences are seen, an appropriate statistical analysis is done to confirm them. Confirm means to determine that the differences are not due to random variation in the simulation experiment.

Many times a scenario is better with respect to one performance measure and the same or worse with respect to others. Evaluating such tradeoffs between scenarios is a part of the art of simulation.

Each of the ways of comparing scenarios will be discussed in the context of the simulation experiment concerning the two stations in a series model. This experiment is presented in Table 4-2. The primary performance measure of interest will be entity lead time.

4.7.1 Comparison by Examination

Some ways of comparing two scenarios by examination of performance measure observations follow.

For example, the graph of the number in the buffer of workstation A for the scenario for the current machine in use at workstation A is shown in Figure 4-3. This could be compared to the graph of the same quantity for the scenario where the new machine is used at workstation A. If the latter graph consistently showed fewer entities in the buffer, then there would be evidence that that using the new machine at workstation A is an improvement: less WIP.
Graphing lead time observations is not usually done since lead time is not a state variable and does not have a value at every moment in simulation time.
For example, histograms of lead time can be compared. If the histogram for the new machine at workstation A scenario clearly shows a greater percentage of entities requiring less time on the line versus the current machine scenario, then there would be evidence that using the new machine at workstation A lowers cycle time.
For example, the average lead time for the current machine scenario is 62.7 seconds and for the new machine scenario is 58.5 seconds. These values are for all replicates of the experiment. Thus, the new machine reduces cycle time by about 6%, which is operationally significant.
For example, the range of cycle time averages over the replicates of the experiment for the current machine scenario is (52.5, 71.7) and for the new machine scenario is (48.8, 68.9). The ranges overlap and thus provide no evidence that the new machine reduces cycle time verses the existing machine at workstation A.

4.7.2 Comparison by Statistical Analysis

This section discusses the use of confidence intervals to confirm that perceived differences in the simulation results for two scenarios are not just due to random variation in the experiment.

Note that the experiment design assures that scenarios share random number streams in common. Thus, the scenarios are not statistically independent. Furthermore, the same number of replicates is made for each scenario. Thus, an approach that compares the simulation results on a replicate by replicate basis is required and helpful. This approach is called the paired-t method.¹

Table 4-7 provides the organization to support the paired-t method. Each row corresponds to a replicate. The difference between the performance measure values for each replicate is shown in the fourth column. These differences are independent observations. A \(\ 1-\alpha\) confidence interval for the population mean of the difference in the fourth column is computed. If this confidence interval does not contain zero, it will be concluded that there is a statistically significant difference between the scenarios with confidence \(\ 1-\alpha\). This confidence interval is constructed and interpreted using the same reasoning as was given in section 4.6.2.

To illustrate, Table 4-8 compares, based on entity lead time, the use of the new machine at workstation A versus the current machine using the paired-t method. A 99% confidence interval for the mean difference is constructed: (3.7, 4.7) with 99% confidence. Thus, with 99% confidence the new machine at workstation A reduces mean cycle time in the range (3.7, 4.7) seconds.

It is also helpful to examine the data in Table 4-8 on a replicate-by-replicate basis. Notice that in all of the replicates, cycle time was less using the new machine at workstation A. It should be noted however that it is still quite possible that in any particular 40 hour period, the two stations in a series line would perform better with respect to cycle time using the current machine at workstation A instead of the new machine. The simulation results show that on the average over many 40 hour periods the line will perform better with respect to cycle time using the new machine at workstation A.

Table 4-7: Format of the Paired-t Method
Replicate	Scenario A	Scenario B	Difference (Scenario A – Scenario B)
1
2
3
4
. . .
n
Average
Std. Dev.
\(\ 1-\alpha \text { C. I.}\) Lower Bound
\(\ 1-\alpha \text { C. I.}\) Upper Bound

¹ Law (2007) provides a more in depth discussion of the comparison of alternatives using confidence intervals, including the generation of confidence intervals when common random numbers are not used.

Table 4-8: Comparison of Scenarios Using the Paired-t Method \(\ (1-\alpha=99 \%)\)
Replicate	Current Machine	New Machine	Difference (Current – New)
1	61.1	57.3	3.8
2	66.0	62.2	3.9
3	60.6	57.6	3.0
4	52.5	48.8	3.7
5	58.3	55.0	3.3
6	63.4	59.3	4.0
7	59.7	55.0	4.8
8	63.9	59.2	4.7
9	62.7	58.5	4.2
10	61.1	56.7	4.4
11	60.7	56.6	4.1
12	65.2	59.8	5.4
13	64.7	58.3	6.4
14	63.6	59.5	4.1
15	67.3	63.5	3.8
16	61.7	57.2	4.5
17	71.7	68.9	2.8
18	63.3	59.0	4.3
19	62.3	58.1	4.2
20	64.6	59.9	4.7
Average	62.7	58.5	4.2
Std. Dev.	3.82	3.8	0.8
99% C. I. Lower Bound	60.9	56.7	3.7
99% C.I. Upper Bound	64.5	60.3	4.7

4.7.2.1 A Word of Caution about Comparing Scenarios

In comparing scenarios, many confidence intervals may be constructed. For each pair of scenarios, several performance measures may be compared. Many scenarios may be tested as well.

The question arises as to the resulting \(\ \alpha\) level for all confidence intervals together, \(\ \alpha_{overall}\). This \(\ \alpha_{overall}\) level is the probability that all confidence intervals simultaneously cover the actual difference in value between the scenarios of the system parameter or characteristic each estimates.

A lower bound on \(\ \alpha_{overall}\) is computed using the Bonferroni inequality where a total of k confidence intervals are conducted:

\begin{align}P(\text { all confidence intervals cover the actual value })>=1-\sum_{j}^{k} \alpha_{j}\tag{4-5}ta\end{align}

and thus:

\begin{align}\alpha_{\text {overall }} \leq \sum_{j=1}^{k} \alpha_{j}\tag{4-6}\end{align}

Suppose we compare two scenarios using two performance measures with \(\ \alpha\) = 0.05. A confidence interval of the difference between the scenarios is computed for each performance measure. The lower bound on the probability that both confidence intervals cover the actual difference in the performance measures is given by equation 4-5: \(\ \alpha_{overall}\) <= 0.05 + 0.05 = 0.10.

Consider comparing two scenarios with respective to 10 performance measures. Each confidence interval is computed using \(\ \alpha\) = 0.05. Then the probability all confidence intervals cover the actual difference in the performance measures might be as low as 0.05*10 = 0.50. That is the \(\ \alpha\) error associated with all our work would be 0.5. Thus, when making many comparisons, a small value of \(\ \alpha_{j}\) for each confidence interval is necessary. For example with all \(\ \alpha_{j}\) = 0.01, the overall \(\ \alpha\) error associated with ten comparisons is 0.1, which is acceptably low.

Unfortunately if a large number of performance measures are used or many scenarios are compared, \(\ \alpha_{\text {overall }}\) will always be large. Thus, it is likely that for at least one confidence interval that the true difference between the performance measure values will not be covered. So a difference between two scenarios will not be detected.