Generalized Linear Models

Generalized linear models link the independent variable with the probability of observing the dependent variable.


“Linear models” describe the how the dependent variable (the response) is related through some function to the independent variable(s) which are the response-controlling variables. Ordinary least-squares regression is the most common example of a linear model.\(^1\)

This idea is generalized with GLM to link the probability of a binary outcome through some function to the variables that control it. So it is the probability of the outcome, rather than the outcome itself, that matters.

When the data are binary, OLS is simply unworkable because two of its tenets are violated. OLS requires that the response, \(y\), be unlimited \((-\infty \le y \le \infty)\), and that the error variance be constant everywhere in that range.

Binary and proportion data are constrained to lie on the unit interval, \(0\lt p \lt 1\), and the variance is not constant but depends on \(p: \text{var}(p) = p(1-p)\).

That doesn’t mean you can’t coerce a computer program to give you an answer using OLS with a binary or proportion response. It does mean, however, that the answer will be wrong.

The most common method to describe a binary outcome uses logistic regression, a special case of GLM having the logit, which is related to the logistic density, as the link function .

Another common link function is the probit, related to the normal (Gaussian) density. Both the logit and the probit are symmetrical. But asymmetrical data would require an asymmetrical link, and data that do not reach either zero on the left nor one on the right, require special link functions.
MIL-HDBK-1823A makes extensive use of GLMs to describe the probability of detection for hit/miss inspections.

GLM on a P/C spreadsheet

Nearly three decades ago when using generalized linear models meant that you wrote your own software, I implemented a simple binary-response GLM on a P/C spreadsheet. There are easier methods available now. Still, working through the exercise was very illuminating to me and I recommend it to anyone who would like to get a visceral feel for the machinations of GLMs. You can find step-by-step instructions here.


\(^1\) Linear models are linear in the model parameters, not necessarily linear in the dependent variable. So a model like \(y=\beta_0 + \beta_1 x + \beta_2 x^2\) or even \(y=\beta_0 + \beta_1 \sin(x)\)
is linear (in the model parameters, \(\beta\) ), while a model like \(y=\beta_0 + \beta_1 e^{- \beta_2 x}\) is not a linear model.