Statistical Engineering

Bootstrap and Jacknife

Bootstrap and Jackknife

Bootstrap and Jackknife algorithms don’t really give you something for nothing. They give you something you previously ignored.

The jackknife is an algorithm for re-sampling from an existing sample to get estimates of the behavior of the single sample’s statistics. An example of the jackknife would be to omit the \(1^{st}, 2^{nd}, 3^{rd} \) … observation from a sample of size \(n\). Then compute the \((n-1)\) averages. The variance of these re-sampled averages is an estimate of the variance of the original sample mean.

The bootstrap is a generalization of the jackknife that re-samples, with replacement, some number of times (say 1000), and computes the statistic of interest from these re-samples, thereby providing an estimate of the original sample’s variability. (e.g. The \(95^{th}\) percentile is estimated by the \(950^{th}\) observation of the ordered 1000 re-samples). The name, of course, comes from its apparent ability to pull itself up by its own bootstraps. (In Rudolph Erich Raspe’s tale, Baron Munchausen had fallen to the bottom of a deep lake and just as he was to succumb to his fate he thought to pull himself up by his own bootstraps.)

Cautions:

While this simple algorithm works quite well in many situations, it is termed a “naive” bootstrap because indiscriminate use can produce bogus results. For example, a simple resampling of a time-series would ignore any auto-correlation and lead to spurious conclusions. If the data are in \(x, y\) pairs, then a naive resampling would be from the joint distribution of \(x\) and \(y\) when for regression purposes we need the conditional distribution of \(y\), given \(x\). And it isn’t magic: If what you’re looking for isn’t in the original sample, then it cannot appear in any of the resamples.

Reference:

Bradley Efron and Robert J. Tibshirani, An Introduction to the Bootstrap, Chapman and Hall, 1993