Convergence in Distribution

Engineers are familiar with mathematical convergence – that the terminal value of a series approaches some limit as the number of terms increases.


We are less familiar with an analogous statistical concept of “convergence in distribution,” where the characteristic of the limit isn’t a single value, but rather that the character of the sequence itself approaches some specific distribution. An example is the central limit theorem. Further examples are illustrated here, with the dotted arrows indicating asymptotic relationships.

Convergence in probability” is not quite the same as convergence in distribution. Convergence in probability says that the random variable converges to a value I know. So (r.v. – value) = 0, or (r.v. – other r.v.) = 0.  Always.

Convergence in distribution says that they behave the same way (but aren’t the same value).  Clearly if \(X\) has a normal density, \(N(0,1) \)and \(Y\), too, has a normal density, \(Y \sim N(0,1)\), then the difference between a random draw from \(X\) and a random draw from \(Y\) is not equal to zero,  i.e. \(X-Y \ne 0\).

Still other examples of convergence in distribution are the extreme value distributions.

So what? In practical applications simple, direct-sampling* Monte Carlo simulation may not be up to the task of producing draws from the target joint density even when the joint density is correctly specified. (Sadly, many engineering MC simulations rely on an inadequate correlation coefficient, or worse – ignore dependencies among variables.)

Recent advances\(^2\) in computational statistics take advantage of convergence in distribution to simulate the often complicated joint density by sampling directly from the joint probability density itself . These are iterative, rather than direct, sampling methods. It can be shown that under suitable conditions that the sequence of samples ultimately becomes ergodic\(^3\), with elements of the sequence converging in distribution, thus representing samples from the desired joint probability density.

Because they do not have to sample everywhere in the probability space, only where the variables most probably reside, these methods are not fettered by the problem of large dimensions (the Curse of Dimensionality).


  1. Direct-sampling methods attempt to sample from the entire probability space and thus from the joint probability density of interest, usually inversely through the marginal cdfs.
  2. Regrettably, many engineers view statistics as static, hidebound, if not moribund, and sort of a mathematical analog to Latin.  This lamentably ignorant perspective does little to dispel an equally common view of engineer-as-buffoon held by many statisticians.
  3. Time-dependent and other sequential processes are called ergodic if the eventual distribution of states in the system does not depend on the starting state so the random sequence Sm from time = tn to time = tn+m does not depend on n as \(x \rightarrow \infty\).