## Probability and Statistics

## … are *not* the same

#### The differences are not even nuanced. They are Apples and Oranges.

*stress*and

*strain*are not synonymous: they don’t mean the same thing, even though the popular press uses the terms interchangeably.

(*Stress* is a force acting over a unit area. *Strain* is the elongation per unit of original length. One can be viewed as causing the other, and in many instances *stress* = *proportionality constant* \(\times\) *strain*.)

*Probability* and *Statistics* are not the same either. They are related, but much more circuitously than as Hooke’s Law (above) relates stress with strain.

*Probability*can be viewedas the long-run frequency of occurrence**either**as a measure of the plausibility of an event given incomplete knowledge –**or**.**but not both***Statistics*are functions of the observations (data) that often have useful and even surprising properties.

*So what?*

The sample mean, \(\bar X = \sum {(X)/n}\), is a statistic; the population mean, \(\mu\), is not. That is because a *statistic* is observable, being computed from the observations, while a *population parameter*, being a philosophical abstraction, is not observable, and thus must be *estimated*. Statistics, like \(\bar X\), are often used to estimate population parameters like \(\mu\). The fidelity of the estimate depends on the number of observations used in computing the statistic. Notice that the estimate changes slightly every time you take a sample, whereas the population parameter doesn’t.

The population parameters are required to estimate probabilities, based on a probability density function, *pdf* (or probability mass function, *pmf*, if \(X\) is a discrete random variable).

So (finally) we see the relationship between probability and statistics:

From the *observations* we compute *statistics* that we use to *estimate population parameters*, which index the probability density, from which we can compute the *probability* of a future observation from that density.

(With convoluted thought processes like this is it any wonder that statistics is not everyone’s favorite subject?)

##### Caveat:

Notice that estimating the population parameters is only half the battle. The density from which the observations were taken must also be known. For example, given these observations, what is the probability of a new observation being less than zero?

X: 0.10, 0.16, 0.23, 0.32, 0.43, 0.62, 1.0

If you estimate the mean and standard deviation in the usual way, and if you assume that the observations are from a normal density, you would compute that the probability is p=0.1 that a new observation would be less than zero. (If you were paying attention to the very small sample size and used the t density, rather than the normal, you would have p=0.12.)

But these observations are not from a normal density, rather they are log-normal, something that a *quantile-quantile* plot would have suggested.

Thus the probability of a future observation being less than zero, is p=0, because the log-normal density is defined only for X > 0, since \(-\infty \lt \log(x) \lt +\infty\).