One of the most challenging things for a simulation analyst is explaining the uncertainty around the results to non-experts. It is not unusual for simulation results to be presented in language such as, “the expected production/shift is going to be 210,000 t +/- 20,000 t at 95% confidence.” Although, the above statement makes perfect sense to a statistician, some engineers and mine managers cannot fully understand its implications. However, without statements like that it is not possible to fully communicate the uncertainty surrounding the output (in this case, production/shift).
In this post, I will attempt to explain one of the most popular ways of explaining stochastic output: the use of the 95% confidence interval. I will first examine the assumption (central limit theorem) made by analysts in order to describe the distribution of the output. Secondly, I will present the simple formula that is used to estimate the interval. And then, I will try to explain, in simple terms, what such results mean.
The central limit theorem is commonly used to justify the assumption of normality in sufficiently large sample size experiments. It is also the justification for the assumption that distribution of the sum or average of several other distributions will be normal. Given that the simulation output is the result of several, large sample, experiments (albeit pseudo-random) of the stochastic input variables (which are sampled from theoretical distributions), the simulation output(s) is often considered normally distributed.
Usually a simulation experiment will include many replications (runs) of the same model under the same conditions. Each run will produce an output. The number of runs should be sufficient to make valid conclusions about the output. If 100 replications are run then 100 estimates of the output will be obtained. The expected value of the output will be the mean value of the 100 estimates. That is easy enough to communicate. However, the benefit of simulation is the fact that you obtain the 100 estimates and not a single estimate. Hence, there is a need to use the 100 estimates to obtain statistics (mean, variance, standard deviation etc.) of the output. From the central limit theorem, these statistics are the statistics of a normal distribution.
Having assumed normality, the half-width, h can be defined as h = s×tv,(1-α/2). s is the sample standard deviation; v is the degrees of freedom; and α is the significance level. At significance level, α, the interval for the output is expected to be [mean – h, mean + h]. Remember the earlier statement repeated below? In that case, h = 20,000 t at 95% confidence (α = 0.05) and production/shift is expected to be between 190,000 and 230,000 t.
“the expected production/shift is going to be 210,000 t +/- 20,000 t at 95% confidence”
What does all this mean to the mine manager or the decision maker? Still using our example above, this plainly means 19 out of 20 shifts the production will be between 190,000 and 230,000 t. There is a 1 in 40 chance that the production will be below 190,000 t and a similar 1 in 40 chance that the production will exceed 230,000 t. It gets a bit more complicated when you are comparing the 100 replications of scenario 1 to the 100 replications of scenario 2. But generally speaking this basic interpretation of simulation results helps the non-expert understand what uncertainty means in the system under study.