Parameter Estimates are NOT Parameter Values

Statistical History Lesson

There is a profound difference between the mathematical behavior of a function whose parameter values are defined (e.g.the FORM/SORM paradigm) and the same function whose parameter values must be estimated from data. In the first instance it is not unreasonable to ask “What is the probability that a point is some number of (rotated and normalized) standard deviations from the given joint mean?”

In the second instance the question becomes “Given these experimental observations, what is the probability of some future observation being at least as large (or small) as some point of interest?” These are not synonymous interrogatives, and even some very famous statisticians had difficulty seeing the distinction.

Karl Pearson (1857-1936) founded the prestigious statistical journal Biometrika and for whom the Pearson correlation coefficient is named, invented the Chi-square test.  Nonetheless he failed to appreciate that he was incorrectly using the sample means as though they were the population means, treating the means as known, when they were only estimated from the data.

This lead to a famous row with another luminary, R. A. Fisher (1890-1962) who wrote the first book on Design of Experiments and who revolutionized statistics with the concept of likelihood and estimating parameters by maximizing their likelihood. Fisher pointed out that Pearson had misunderstood his own Chi-square test, and was therefore calculating probability of failure incorrectly.

The resulting acrimonious and vitriolic row lasted years and was finally resolved in 1926 in Fisher’s favor based on data collected, ironically, by Egon Pearson, Karl Pearson’s son, who published the results of 11,688 2×2 contingency tables, observed under random sampling.

If Karl Pearson had been correct the observed mean value for Chi-square would have been three.  Fisher said it would be one, as it was (1.00001).  As a result, the elder Pearson’s erroneous calculations would erroneously accept as having a five percent probability of occurrence ($$p=0.05$$), something with a true probability of only one-half of one percent
($$p=0.005$$) – an error of 10x,  effectively increasing his Type II error rate (failing to reject a false null hypothesis) by 10x.

The lesson here is that really smart people can make this mistake and the consequences can be severe.

References:

1. Alan Agresti, Categorical Data Analysis, 2nd ed. Wiley, 2002, sec 16.2
2. Joan Fisher Box, R. A. Fisher: The Life of a Scientist, Wiley, 1978
3. Fisher, Ronald A., Statistical Methods for Research Workers.  (First published in 1925; 14th edition was ready for publication in 1962, when Fisher died, and was published in 1990, by the Oxford University Press, along with Experimental Design and Scientific Inference, with corrections to the 1991 edition, in 1993.)