## Bootstrap and Jacknife

### Bootstrap and Jackknife

*Bootstrap and Jackknife algorithms don’t really give you something for nothing. They give you something you previously ignored.*

The **jackknife** is an algorithm for re-sampling from an existing sample to get estimates of the behavior of the single sample’s statistics. An example of the jackknife would be to omit the \(1^{st}, 2^{nd}, 3^{rd} \) … observation from a sample of size \(n\). Then compute the \((n-1)\) averages. The variance of these re-sampled averages is an estimate of the variance of the original sample mean.

The **bootstrap** is a generalization of the jackknife that re-samples, with replacement, some number of times (say 1000), and computes the statistic of interest from these re-samples, thereby providing an estimate of the original sample’s variability. (*e.g.* The \(95^{th}\) percentile is estimated by the \(950^{th}\) observation of the ordered 1000 re-samples). The name, of course, comes from its apparent ability to pull itself up by its own bootstraps. (In Rudolph Erich Raspe’s tale, Baron Munchausen had fallen to the bottom of a deep lake and just as he was to succumb to his fate he thought to pull himself up by his own bootstraps.)

##### Cautions:

While this simple algorithm works quite well in many situations, it is termed a “naive” bootstrap because indiscriminate use can produce bogus results. For example, a simple resampling of a time-series would ignore any auto-correlation and lead to spurious conclusions. If the data are in \(x, y\) pairs, then a naive resampling would be from the * joint distribution* of \(x\) and \(y\) when for regression purposes we need the

**of \(y\), given \(x\). And it isn’t magic: If what you’re looking for**

*conditional distribution**isn’t*in the original sample, then it cannot appear in any of the resamples.

##### Reference:

Bradley Efron and Robert J. Tibshirani, **An Introduction to the Bootstrap**, Chapman and Hall, 1993