Curse of Dimensionality

 Home Up The requirement in direct-sampling* Monte Carlo simulation that the number of samples per variable increase exponentially with the number of variables to maintain a given level of accuracy is called the "Curse of Dimensionality."

Let the complete probability space for one variable be represented by the unit interval (0, 1), and imagine drawing ten samples along that interval. Each sample would then have to represent 10% of the probability space (on average). Now consider a second variable defined on another, orthogonal, (0, 1) interval, also being represented by ten samples. We have produced 10 (x1 , x2 ) points on a plane defined by the orthogonal x1 , x2 lines, and representing a new probability space. But the new space has 10 10 = 100 area units, so each of the ten points now represents only 1% of the probability space. It would require 100 points for each point to represent the same 10% of the probability space that was represented by 10 points in only one dimension. This is illustrated below, where the 10 points are not drawn at random to illustrate their diminished coverage.

 The Number of Points Would Need to Increase Exponentially to Maintain a Given Accuracy.

Now, although there are 100 (x1 , x2 ) points, neither axis has more accuracy than was provided by the previous 10, because there are 10 values of x1 required for the 10 different values of x2 necessary to represent the new probability plane. Because in practice the samples are taken at random it appears as though there are more samples, but this isn't the case in terms of probability space coverage which is related to the density of points, not the number of observations per axis.

Next consider adding another dimension creating a probability space represented by a cube with ten units on a side, and 1000 probability units within. It now requires stacking 10 of the previous (x1 , x2 ) planes to construct the new (x1 , x2 , x3 ) cube, and of course 10 times the number of samples per axis to maintain the former level of accuracy, or probability coverage. It is obvious that10n samples would be required for a n-dimension problem.   This is referred to in the Bayesian literature as the "curse of dimensionality" and places a practical limit of about a half dozen dimensions on direct-sampling Monte Carlo. Fortunately there are other MC methods that aren't encumbered by this impediment.

_____________

* Direct-sampling methods attempt to sample from the entire probability space and thus from the joint probability density of interest indirectly, usually inversely through the marginal cdfs.

 [ HOME ] [ Feedback ] Mail to Charles.Annis@StatisticalEngineering.com Copyright � 1998-2008 Charles Annis, P.E. Last modified: June 08, 2014