Statistical Engineering

Monte Carlo Oversights

Summary:

Most Engineering Monte Carlo simulations ignore the distinction between parameter values, and estimates of parameter values.
This can result in a gross underestimation of the probability of “low-probability” events, i.e., the true probability is much higher than it is estimated to be.
Unfortunately such simulations are then used to “validate” other statistical missteps.

Background:

Underlying nearly all Monte Carlo simulations is a simple (and correct) idea: Samples from a known probability density can be arithmetically produced in large numbers to observe the behavior of infrequent events. Of course most engineering simulations are significantly more involved than multiple samples from a single known distribution. These simulations sample from many different known densities and compute some characteristic of interest (e.g. failure). Then the number of these is tallied and their frequency estimated as their number divided by the total number of sample created, \(p = N_{fail}/N_{samples}\). There is nothing wrong with this thinking.

But in all real situations the parameters of the probability densities are NOT known; however statistical parameter estimates are usually available. For example, if the density is assumed to be normal, its mean, \(\mu\), can be estimated by \(\bar X=X/n\) But \(\mu\) does NOT equal \(\bar X\), although the Central Limit Theorem insists that it is probably close.

Here is where most of my engineering colleagues would say “close enough for engineering work” and conclude, erroneously, that \(\mu\) equals \(\bar X\). These same colleagues would never be tempted to say that some small probability, say, 1/1000, is equal to zero because it is “close enough for engineering work.” Why is it close enough in the one situation but not in the other?

The answer is that in the second case it is obvious that such an assumption would lead to a gross underestimation of potential risk, while in the first case the underestimation is less obvious, although just as real.

Two Simulations:

To illustrate this numerically we suggest two simulations. These are not presented here, nor are their results. For the distinctions to be meaningful, dear reader, requires that you create and execute both simulations (they’re E-Z!) and draw conclusions for yourself.

First, we assume that you have a means of creating a Monte Carlo simulator; if you didn’t it is unlikely you would have been sufficiently interested to read this far. So the steps to be taken in the two simulations will be outlined. The actual implementation will depend on individual choice.

Simulation 1:

The is the simplest form of MC simulation, and is an exemplar for most engineering MC simulations.

Define the universe. In this simplest of simulations we will sample from a standard normal density, with known parameters, \(\mu=0\) and \(\sigma=1\).
Draw \(n = 10,000\) samples from this universe.
Estimate the empirical probability that \(x \lt -3\) by counting the values of \(x\) that are less than minus 3, and dividing by 10,000.

Of course we already know the theoretical answer because we know we are sampling from a standard normal density, \(p(x \lt 3) = 0.001349898\). Your simulation will likely have observed about 13 values of x less than -3. It won’t be exactly 13 because this is a random sample. But if you were to create a much larger sample your empirical probability would approach the true value. You already know all this, since this is the way you already perform MC simulations. The reason for going to all the trouble is to provide a comparison for Simulation 2.

Simulation 2:

This simulation considers the more realistic situation where the true values for \(\mu\) and \(\sigma\) are NOT known. WAIT! Of course you know them; you are building the simulator! But you can override your simulated omniscience and create a more realistic situation by using one simulation nested within another.

The outer simulation loop mimics your having conducted a laboratory experiment to determine the mean and variance of the probability density to be used in the inner simulation. The inner simulation is just like Simulator 1, except that it uses parameter estimates provided by the outer simulator. In this way you are simulating having to rely on parameter estimates since the true values are unknown and must be estimated by laboratory or other experiments.

Outer Loop:

Again, define the universe, this time providing estimates of the unknown parameters, \(\mu\) and \(\sigma\). We will use \(\bar X\) and s as estimators of \(\mu\) and \(\sigma\), respectively.

To do this draw a sample of size, say \(n=10\), from the known standard normal, and compute \(\bar X\) and \(s\) using the standard estimates (see Notes). This represents your having to use estimates for \(\mu\) and \(\sigma\) just as you must do in your real simulations.

Inner Loop:

Now conduct Simulation 1, but using the parameter estimates, rather than their true values, as in Simulation 1. That is, draw \(n=10,000\) samples from this universe, i.e. \(N \bar X, s)\) rather than \(N(0,1)\)
Keep a running total of the number of instances that \(x \lt 3\).

Outer Loop (continued):

1. Repeat the Outer Loop (and of course its nested Inner Loop) some large number of times (say 1,000), using new estimates, \(\bar X\) and \(s\), for \(\mu\) and \(\sigma\) for each generation of 10,000 Simulation1 samples.
2. Now estimate the empirical probability that x < -3, by counting the values of x that are less than -3, and dividing by the total number of samples, in our example,

\(\color{red} {10,000 \times 100,000 = 10^7 \textit{ i.e.} \space p=N_{(x \lt -3)}/10^7}\)

Compare this result with p = 0.001349898. Startling, isn’t it? You observed more than 5 times as many “failures” as you expected.

Try running the inner loop with 100,000 samples, rather than the current 10,000. What happens to the empirical probability that \(x \lt 3\)? Right! Nothing! Running the inner loop longer and longer has little influence because the real cause of the variability is the uncertainty in the sampling population’s parameters, the very entities you usually assume that you know with certainty, when you really had to have estimated them from outside data.

Do the simulations again, but this time record the number of times that \(x \lt -4\). This time you observe nearly 50 times as many “failures” as you expected. If your criterion were \(p = 1 \times 10^6\), Simulation 1 will show about 1 per million \(x \lt -4.753424\), while Simulation 2 will be more than 500 times that. That ISN’T close enough for engineering work.

A Feud of Historic Dimensions:

Confusing a parameter value with its estimate has an interesting history, and was the cause of a monumental quarrel between two statistical giants, Karl Pearson and R. A. Fisher.

Karl Pearson (1857-1936) was the leading statistician of his day. He founded the prestigious statistical journal Biometrika (still a major publication), the Pearson correlation coefficient is named after him, he developed a system of probability density curves also named after him, and he invented the Chi-square test of statistical significance.

R. A. Fisher (1890-1962) wrote the first book on Design of Experiments, revolutionized statistics with the concept of likelihood and estimating parameters by maximizing their likelihood, and told Pearson that he misunderstood his own Chi-square test, and was therefore calculating probability of failure incorrectly by using the sample means as though they were the population means.

The resulting acrimonious and vitriolic row lasted years and was finally resolved in 1926 in Fisher’s favor by, ironically, Egon Pearson, Karl Pearson’s son, who published the results of 12,000 2×2 tables observed under random sampling. If Karl Pearson had been correct the observed average value for Chi-square would have been 3. Fisher said it would be 1, as it was. (ref.: Joan Fisher Box, R.A. Fisher – the Life of a Scientist, Wiley, 1978) As a result, the elder Pearson’s erroneous calculations would erroneously accept as having a 5% probability of occurrence, something with a true probability of 0.5% – an error of 10x, effectively increasing his Type II error rate (failing to reject a false null hypothesis) by 10x.

Summary:

There are several lessons here:

You are in good company – even great minds can miss the distinction between a parameter value and its estimate.
Problems caused by confusing sample statistics with population parameters are not restricted to the Normal density.
Consequences of this confusion will result in anticonservative underestimations of “low probability” events, i.e the real probability will by higher than you think it is. (It will also result in anticonservative overestimations of “high probability” events because the unaccounted for uncertainty forces the true probabilities toward \(p=0.5\).

Let’s Review:

\[s \ne \sigma \text{ and } \bar X \ne \mu\]

In English: s is not sigma and Xbar is not mu. The sample statistic does not equal the population parameter (but it will be close to it).

… and if your simulator is not accounting for these facts, then your probability estimates are WRONG.

All is not hopeless, of course, and statisticians have been studying the phenomenon uncovered by the Pearson – Fisher feud for nearly a century. Not surprisingly there is no “silver bullet” quick-fix, one-size-fits-all solution, but we do have many decades of statistical theory and practice in successfully dealing with it.

Notes

\[\bar X = \frac1n \sum_{i=1}^{n} X_i\]

\[s=\sqrt{\frac{1}{n-1} \sum_{i=1}^{n} (X – \bar X)^2}\]