Goodness-of-Fit

… tests for Statistical Distributions

Of the many quantitative goodness-of-fit techniques (e.g.: Komolgorov-Smirnov, Anderson-Darling, Shipiro-Wilk, von Mises), I prefer the Anderson-Darling test because it is more sensitive to deviations in the tails of the distribution than is the older Komolgorov-Smirnov test.

Note: The Anderson-Darling test (or Komolgorov-Smirnov or Shipiro-Wilk) does not tell you that you do have a Normal density. It only tells you when the data make it unlikely that you do not.\(^1\)

Anderson-Darling can be applied to any distribution, but finding tables of critical values isn’t so easy. Included here are two of the most useful tables, for the normal and lognormal, and for the Weibull, exponential, and Gumbel.

For the normal and lognormal distributions, the test statistic, \(A^2\) is calculated from

\[A^2=-n-(1/n)\sum_{n=1}^{n}{(2i-1)\big(ln(w_i)+ln(1-w_{n-i+1})\big)}  \tag{1} \]

This formula needs to be modified for small samples:

\[A^2 _m = A^2 \bigg(1+\frac{0.75}{n} + \frac{2.25}{n^2}\bigg)\]

and then compared to the critical value from the table below:

\(\alpha\)0.10.050.0250.01
\(A^2 _{crit}\)0.6310.7520.8731.035

Reference: D’Agostino and Stephens, Goodness-Of-Fit Techniques, Marcel-Dekker, New York, 1986, Table 4.7, p.123. All of Chapter 4, pp.97-193, deals with goodness-of-fit tests based on empirical distribution function (EDF) statistics.

The other popular family of distributions includes the Weibull for distributions of minima, and Gumbel for distributions of maxima. The Gumbel variable \(X\), and Weibull variable \(Y\) are related by \(X=ln(1/Y)\) . A Weibull distribution with the shape parameter equal to one produces the exponential distribution as a special case.

For the Weibull \(^2\) (and Gumbel) distributions, the test statistic, \(A^2\) is again calculated from Equation 1 just as for the normal, but \(w\) is the cdf for the distribution under consideration. For the Weibull this is

\[w_i = F(x) = 1 – \exp\big(-(x_i/\eta)^\beta \big)\]

and \(\eta, \space \beta\), are the model scale and shape parameters respectively.

This formula needs to be modified for small samples,

\[A^2 _m = A^2 \bigg(1+\frac{0.2}{ \sqrt{n} } \bigg)\]

and then compared to an appropriate critical value from the table below.

\(\alpha\)0.10.050.0250.01
\(A^2 _{crit}\)0.6370.7570.8771.038

(Ref: D’Agostino and Stephens, 1986, Table 4.17, p.146)


Notes:

  1. The Anderson-Darling test, does not tell you that you have a Normal density. It only tells you when the data make it unlikely that you do not. Engineers (and I’m one) hate this kind of statistical double-talk. But the fact remains: Any frequentist test is constructed to disprove something. Just as a dry sidewalk is evidence that it didn’t rain, a wet sidewalk might be caused by rain or by the sprinkler system. So a wet sidewalk can’t prove that it rained, while a not-wet one is evidence that it did not rain.
  2. Although the Weibull, a distribution of “weakest-link” minima, is more widely known, it may not always be the best choice, as its sister, the Gumbel, the asymptotic distribution of maxima.
  3. Useful as the Anderson-Darling test is, good engineering practice requires use of the IntraOcular Trauma Test to confirm goodness-of-fit, and other preliminary findings.
  4. \(R^2\) is a common criterion for goodness-of-fit for regression models but it isn’t very good.