Probability and Statistics

… are not the same

The differences are not even nuanced. They are Apples and Oranges.


Engineers know that stress and strain are not synonymous: they don’t mean the same thing, even though the popular press uses the terms interchangeably.

  • Stress is a force acting over a unit area.
  • Strain is the elongation per unit of original length.

One can be viewed as causing the other, and in many instances stress = proportionality constant \(\times\) strain.

Probability and Statistics are not the same either. They are related, but much more circuitously than as Hooke’s Law (above) relates stress with strain.

  • Probability can be viewed either as the long-run frequency of occurrence or as a measure of the plausibility of an event given incomplete knowledge – but not both.
  • Statistics are functions of the observations (data) that often have useful and even surprising properties.
So what?

The sample mean, \(\bar X = \sum {(X)/n}\), is a statistic; the population mean, \(\mu\), is not. That is because a statistic is observable, being computed from the observations, while a population parameter, being a philosophical abstraction, is not observable, and thus must be estimated. Statistics, like \(\bar X\) are often used to estimate population parameters like \(\mu\). The fidelity of the estimate depends on the number of observations used in computing the statistic. Notice that the estimate changes slightly every time you take a sample, whereas the population parameter doesn’t.

The population parameters are required to estimate probabilities, based on a probability density function, pdf (or probability mass function, pmf, if \(X\) is a discrete random variable).

So (finally) we see the relationship between probability and statistics:

From the observations we compute statistics that we use to estimate population parameters, which index the probability density, from which we can compute the probability of a future observation from that density.

(With convoluted thought processes like this is it any wonder that statistics is not everyone’s favorite subject?)


Notice that estimating the population parameters is only half the battle. The density from which the observations were taken must also be known. For example, given these observations, what is the probability of a new observation being less than zero?

X: 0.10, 0.16, 0.23, 0.32, 0.43, 0.62, 1.0

If you estimate the mean and standard deviation in the usual way, and if you assume that the observations are from a normal density, you would compute that the probability is \(p=0.1\) that a new observation would be less than zero. (If you were paying attention to the very small sample size and used the t density, rather than the normal, you would have \(p=0.12\).)

But these observations are not from a normal density, rather they are log-normal, something that a quantile-quantile plot would have suggested.

Thus the probability of a future observation being less than zero, is \(p=0\), because the log-normal density is defined only for \(X \gt 0\), since \(-\infty \lt \log(X) \lt +\infty\).


In statistics, as with engineering, pay attention to the fine print.