## Joint, Marginal, and Conditional Distributions

## Joint, Marginal, and Conditional Distributions

*We engineers often ignore the distinctions between joint, marginal, and conditional probabilities – to our detriment.*

*Figure 1 – How the Joint, Marginal, and Conditional distributions are related.*

* Joint probability* is the probability of two or more things happening together. \(f(x, y \mid \theta)\) where \(f\) is the probability of \(x \text{ and } y\) together as a pair, given the distribution parameters, vector theta. Often these events are not independent, and sadly this is often ignored. Furthermore, the correlation coefficient itself does NOT adequately describe these interrelationships.

Consider first the idea of a ** probability density** or

**: \(f(x \mid \theta)\) where \(f\) is the probability density of \(x\), given the distribution parameters, \(\theta\). For a normal distribution, \(\theta=(\mu, \sigma)^T\) where \(mu\) is the mean, and \(sigma\) is the standard deviation. This is sometimes called a**

*distribution**,***pdf**

*. The integral of a*

**probability density function***, the area under the curve (corresponding to the probability) between specified values of \(x\), is a***pdf**

*,*

**cdf***, \(F(x, \mid \theta)\). For discrete \(f , F\) is the corresponding summation.*

**cumulative distribution function**A * joint probability density* two or more variables is called a

*. It is often summarized by a vector of parameters, which may or may not be sufficient to characterize the distribution completely. Example, the normal is summarized (sufficiently) by a mean vector and covariance matrix.*

**multivariate distribution*** marginal probability*: \(f(x, \mid \theta)\) where \(f\) is the probability density of \(x\), for all possible values of \(y\), given the distribution parameters, vector \(\theta\). The marginal probability is determined from the joint distribution of \(x \text{ and } y\) by integrating over all values of \(y\), thereby “integrating out” the variable \(y\). In applications of

**Bayes’s Theorem**, \(y\) is often a matrix of possible parameter values. Figure 1 illustrates joint, marginal, and conditional probability relationships.

* conditional probability*: \(f(x \mid y, \theta)\) and theta where where \(f\) is the probability of \(x\) by itself, given specific value of variable \(y\), and the distribution parameters, \(\theta\). (See Figure 1) If \(x \text{ and } y\) represent events \(A\) and \(B\), then \(P(A|B) = n_{AB}/n_{B}\) , where \(n_{AB}\) is the number of times both \(A\) and \(B\) occur, and \(n_B\) is the number of times \(B\) occurs. \(P(A|B) = P(AB)/P(B)\), since \(P(AB) = n_{AB}/N\) and \(P(B) = n_B/N\) so that

\[PA|B = \frac{n_{AB}/N}{n_B/N} = n_{AB}/n_B\]

Note that in general the conditional probability of \(A\) given \(B\) is not the same as \(B\) given \(A\). The probability of both \(A\) and \(B\) together is \(P(AB)\), and if both \(P(A)\) and \(P(B)\) are non-zero this leads to a statement of **Bayes Theorem**:

\[P(A \mid B) = P(B \mid A) \times P(A) / P(B)\] and \[P(B \mid A) = P(A \mid B) \times P(B) / P(A)\]

Conditional probability is also the basis for * statistical dependence* and

**.**

*statistical independence** Independence*: Two variables, \(A\) and \(B\), are independent if their conditional probability is equal to their unconditional probability. In other words, \(A\) and \(B\) are independent

*if, and only if*, \(P(A \mid |B)=P(A)\), and \(P(B \mid A)=P(B)\). In engineering terms, \(A\) and \(B\) are independent if knowing something about one tells nothing about the other. This is the origin of the familiar, but often misused, formula \(P(AB) = P(A) \times P(B)\), which is true only when \(A\) and \(B\) are independent.

* conditional independence*: \(A\) and \(B\) are conditionally independent, given \(C\), if

\[Prob(A=a, B=b \mid C=c) = Prob(A=a \mid C=c) \times Prob(B=b \mid C=c)\] whenever \(Prob(C=c) > 0\).

So the joint probability of \(A B C\), when \(A\) and \(B\) are conditionally independent, given \(C\), is then

\(Prob(C) \times Prob(A \mid C) x Prob(B \mid C)\). A directed graph illustrating this conditional independence is \(A \leftarrow C \rightarrow B\).