9.5: Continuous Random Vectors

Last updated
Save as PDF

Page ID: 83070

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

Following the template used to extend discrete random variables to discrete random vectors, we now introduce the concept of continuous random vectors. Let \(X=\left(X_{1}, X_{2}\right)\) be a random variable with \[\begin{aligned} &a_{1} \leq X_{1} \leq b_{1} \\ &a_{2} \leq X_{2} \leq b_{2} . \end{aligned}\] The probability density function (pdf) is now a function over the rectangle \[R \equiv\left[a_{1}, b_{1}\right] \times\left[a_{2}, b_{2}\right]\] and is denoted by \[\left.f_{X_{1}, X_{2}}\left(x_{1}, x_{2}\right) \quad \text { (or, more concisely, } f_{X}\left(x_{1}, x_{2}\right)\right) .\] The pdf must satisfy the following conditions: \[\begin{aligned} f_{X}\left(x_{1}, x_{2}\right) & \geq 0, \quad \forall\left(x_{1}, x_{2}\right) \in R \\ \int_{a_{1}}^{b_{1}} \int_{a_{2}}^{b_{2}} f_{X}\left(x_{1}, x_{2}\right) &=1 . \end{aligned}\] The value of the pdf can be interpreted as a probability per unit area, in the sense that \[P\left(x_{1} \leq X_{1} \leq x_{1}+d x_{1}, x_{2} \leq X_{2} \leq x_{2}+d x_{2}\right)=f_{X}\left(x_{1}, x_{2}\right) d x_{1} d x_{2},\] and \[P(X \in D)=\iint_{D} f_{X}\left(x_{1}, x_{2}\right) d x_{1} d x_{2},\] where \(\iint_{D}\) refers to the integral over \(D \subset R\) (a subset of \(R\) ).

Let us now revisit key concepts used to characterize discrete joint distributions in the continuous setting. First, the marginal density function of \(X_{1}\) is given by \[f_{X_{1}}\left(x_{1}\right)=\int_{a_{2}}^{b_{2}} f_{X_{1}, X_{2}}\left(x_{1}, x_{2}\right) d x_{2} .\] Recall that the marginal density of \(X_{1}\) describes the probability distribution of \(X_{1}\) disregarding the state of \(X_{2}\). Similarly, the marginal density function of \(X_{2}\) is \[f_{X_{2}}\left(x_{2}\right)=\int_{a_{1}}^{b_{1}} f_{X_{1}, X_{2}}\left(x_{1}, x_{2}\right) d x_{1} .\] As in the discrete case, the marginal densities are also valid probability distributions.

The conditional probability density function of \(X_{1}\) given \(X_{2}\) is \[f_{X_{1} \mid X_{2}}\left(x_{1} \mid x_{2}\right)=\frac{f_{X_{1}, X_{2}}\left(x_{1}, x_{2}\right)}{f_{X_{2}}\left(x_{2}\right)} .\] Similar to the discrete case, the marginal and conditional probabilities are related by \[f_{X_{1}, X_{2}}\left(x_{1}, x_{2}\right)=f_{X_{1} \mid X_{2}}\left(x_{1} \mid x_{2}\right) \cdot f_{X_{2}}\left(x_{2}\right),\] or \[f_{X_{1}}\left(x_{1}\right)=\int_{a_{2}}^{b_{2}} f_{X_{1}, X_{2}}\left(x_{1}, x_{2}\right) d x_{2}=\int_{a_{2}}^{b_{2}} f_{X_{1} \mid X_{2}}\left(x_{1} \mid x_{2}\right) \cdot f_{X_{2}}\left(x_{2}\right) d x_{2}\] In words, the marginal probability density function of \(X_{1}\) is equal to the integration of the conditional probability density of \(f_{X_{1}, X_{2}}\) weighted by the probability density of \(X_{2}\).

Two continuous random variables are said to be independent if their joint probability density function satisfies \[f_{X_{1}, X_{2}}\left(x_{1}, x_{2}\right)=f_{X_{1}}\left(x_{1}\right) \cdot f_{X_{2}}\left(x_{2}\right)\] In terms of conditional probability, the independence means that \[f_{X_{1} \mid X_{2}}\left(x_{1}, x_{2}\right)=\frac{f_{X_{1}, X_{2}}\left(x_{1}, x_{2}\right)}{f_{X_{2}}\left(x_{2}\right)}=\frac{f_{X_{1}}\left(x_{1}\right) \cdot f_{X_{2}}\left(x_{2}\right)}{f_{X_{2}}\left(x_{2}\right)}=f_{X_{1}}\left(x_{1}\right)\] In words, knowing the outcome of \(X_{2}\) does not add any new knowledge about the probability distribution of \(X_{1}\).

The covariance of \(X_{1}\) and \(X_{2}\) in the continuous case is defined as \[\operatorname{Cov}\left(X_{1}, X_{2}\right)=E\left[\left(X_{1}-\mu_{1}\right)\left(X_{2}-\mu_{2}\right)\right]\] and the correlation is given by \[\rho_{X_{1} X_{2}}=\frac{\operatorname{Cov}\left(X_{1}, X_{2}\right)}{\sigma_{X_{1}} \sigma_{X_{2}}}\] Recall that the correlation takes on a value between - 1 and 1 and indicates how strongly the outcome of two random events are related. In particular, if the random variables are independent, then their correlation evaluates to zero. This is easily seen from \[\begin{aligned} \operatorname{Cov}\left(X_{1}, X_{2}\right) &=E\left[\left(X_{1}-\mu_{1}\right)\left(X_{2}-\mu_{2}\right)\right]=\int_{a_{2}}^{b_{2}} \int_{a_{1}}^{b_{1}}\left(x_{1}-\mu_{1}\right)\left(x_{2}-\mu_{2}\right) f_{X_{1}, X_{2}}\left(x_{1}, x_{2}\right) d x_{1} d x_{2} \\ &=\int_{a_{2}}^{b_{2}} \int_{a_{1}}^{b_{1}}\left(x_{1}-\mu_{1}\right)\left(x_{2}-\mu_{2}\right) f_{X_{1}}\left(x_{1}\right) f_{X_{2}}\left(x_{2}\right) d x_{1} d x_{2} \\ &=\left[\int_{a_{2}}^{b_{2}}\left(x_{2}-\mu_{2}\right) f_{X_{2}}\left(x_{2}\right) d x_{2}\right] \cdot\left[\int_{a_{1}}^{b_{1}}\left(x_{1}-\mu_{1}\right) f_{X_{1}}\left(x_{1}\right) d x_{1}\right] \\ &=0 \end{aligned}\] Note the last step follows from the definition of the mean.

Example 9.5.1 Bivariate uniform distribution

A bivariate uniform distribution is defined by two sets of parameters \(\left[a_{1}, b_{1}\right]\) and \(\left[a_{2}, b_{2}\right]\) that specify the range that \(X_{1}\) and \(X_{2}\) take on, respectively. The probability density function of \(\left(X_{1}, X_{2}\right)\) is \[f_{X_{1}, X_{2}}\left(x_{1}, x_{2}\right)=\frac{1}{\left(b_{1}-a_{1}\right)\left(b_{2}-a_{2}\right)}\] Note here \(X_{1}\) and \(X_{2}\) are independent, so \[f_{X_{1}, X_{2}}\left(x_{1}, x_{2}\right)=f_{X_{1}}\left(x_{1}\right) \cdot f_{X_{2}}\left(x_{2}\right)\] where \[f_{X_{1}}\left(x_{1}\right)=\frac{1}{b_{1}-a_{1}} \quad \text { and } \quad f_{X_{2}}\left(x_{2}\right)=\frac{1}{b_{2}-a_{2}} .\] As for the univariate case, we have \[P(X \in D)=\frac{A_{D}}{A_{R}},\] where \(A_{D}\) is the area of some arbitrary region \(D\) and \(A_{R}\) is the area of the rectangle. In words, the probability that a uniform random vector - a random "dart" lands in \(D\) - is simply the ratio of \(A_{D}\) to the total area of the dartboard \(\left(A_{R}\right) \cdot{ }^{1}\) This relationship - together with our binomial distribution - will be the key ingredients for our Monte Carlo methods for area calculation.

Note also that if \(A_{D}\) is itself a rectangle aligned with the coordinate directions, \(A_{D} \equiv c_{1} \leq\) \(x_{1} \leq d_{1}, c_{2} \leq x_{2} \leq d_{2}\), then \(P(X \in D)\) simplifies to the product of the length of \(D\) in \(x_{1},\left(d_{1}-c_{1}\right)\), divided by \(b_{1}-a_{1}\), and the length of \(D\) in \(x_{2},\left(d_{2}-c_{2}\right)\), divided by \(b_{2}-a_{2}\). Independence is manifested as a normalized product of lengths, or equivalently as the AND or intersection (not OR or union) of the two "event" rectangles \(c_{1} \leq x_{1} \leq d_{1}, a_{2} \leq x_{2} \leq b_{2}\) and \(a_{1} \leq x_{1} \leq b_{1}, c_{2} \leq x_{2} \leq d_{2}\).

To generate a realization of \(X=\left(X_{1}, X_{2}\right)\), we express the vector as a function of two independent (scalar) uniform distributions. Namely, let us consider \(U_{1} \sim \mathcal{U}(0,1)\) and \(U_{2} \sim \mathcal{U}(0,1)\). Then, we can express the random vector as \[\begin{aligned} X_{1} &=a_{1}+\left(b_{1}-a_{1}\right) U_{1} \\ X_{2} &=a_{2}+\left(b_{2}-a_{2}\right) U_{2} \\ X &=\left(X_{1}, X_{2}\right) . \end{aligned}\] We stress that \(U_{1}\) and \(U_{2}\) must be independent in order for \(X_{1}\) and \(X_{2}\) to be independent.

Advanced Material

Example 9.5.2 Bivariate normal distribution

Let \(\left(X_{1}, X_{2}\right)\) be a bivariate normal random vector. The probability density function of \(\left(X_{1}, X_{2}\right)\) is of the form \[\begin{aligned} &f_{X_{1}, X_{2}}\left(x_{1}, x_{2}\right)=f^{\text {bi-normal }}\left(x_{1}, x_{2} ; \mu_{1}, \mu_{2}, \sigma_{1}, \sigma_{2}, \rho\right) \\ &\equiv \frac{1}{2 \pi \sigma_{1} \sigma_{2} \sqrt{1-\rho^{2}}} \exp \left\{-\frac{1}{2\left(1-\rho^{2}\right)}\left[\frac{\left(x_{1}-\mu_{1}\right)^{2}}{\sigma_{1}^{2}}+\frac{\left(x_{2}-\mu_{2}\right)^{2}}{\sigma_{2}^{2}}-\frac{2 \rho\left(x_{1}-\mu_{1}\right)\left(x_{2}-\mu_{2}\right)}{\sigma_{1} \sigma_{2}}\right]\right\}, \end{aligned}\] where \(\left(\mu_{1}, \mu_{2}\right)\) are the means, \(\left(\sigma_{1}^{2}, \sigma_{2}^{2}\right)\) are the variances, and \(\rho\) is the correlation. The pairs \(\left\{\mu_{1}, \sigma_{1}^{2}\right\}\) and \(\left\{\mu_{2}, \sigma_{2}^{2}\right\}\) describe the marginal distributions of \(X_{1}\) and \(X_{2}\), respectively. The correlation coefficient must satisfy \[-1<\rho<1\] and, if \(\rho=0\), then \(X_{1}\) and \(X_{2}\) are uncorrelated. For a joint normal distribution, uncorrelated implies independence (this is not true for a general distribution).

Figure 9.15: A bivariate normal distribution with \(\mu_{1}=\mu_{2}=0, \sigma_{1}=3, \sigma_{2}=2\), and \(\rho=1 / 2\).

The probability density function for the bivariate normal distribution with \(\mu_{1}=\mu_{2}=0\), \(\sigma_{1}=3, \sigma_{2}=2\), and \(\rho=1 / 2\) is shown in Figure 9.15. The lines shown are the lines of equal density. In particular, the solid line corresponds to the \(1 \sigma\) line, and the dashed lines are for \(\sigma / 2\) and \(2 \sigma\) as indicated. 500 realizations of the distribution are also shown in red dots. For a bivariate distribution, the chances are \(11.8 \%, 39.4 \%\), and \(86.5 \%\) that \(\left(X_{1}, X_{2}\right)\) takes on the value within \(\sigma / 2\), \(1 \sigma\), and \(2 \sigma\), respectively. The realizations shown confirm this trend, as only a small fraction of the red dots fall outside of the \(2 \sigma\) contour. This particular bivariate normal distribution has a weak positive correlation, i.e. given that \(X_{2}\) is greater than its mean \(\mu_{X_{2}}\), there is a higher probability that \(X_{1}\) is also greater than its mean, \(\mu_{X_{1}}\).

To understand the behavior of bivariate normal distributions in more detail, let us consider the marginal distributions of \(X_{1}\) and \(X_{2}\). The marginal distribution of \(X_{1}\) of a bivariate normal distribution characterized by \(\left\{\mu_{1}, \mu_{2}, \sigma_{1}^{2}, \sigma_{2}^{2}, \rho\right\}\) is a univariate normal distribution with the mean \(\mu_{1}\) and the variance \(\sigma_{1}^{2}\), i.e. \[f_{X_{1}}\left(x_{1}\right) \equiv \int_{x_{2}=-\infty}^{\infty} f_{X_{1}, X_{2}}\left(x_{1}, x_{2}\right) d x_{2}=f^{\text {normal }}\left(x_{1} ; \mu_{1}, \sigma_{1}\right) .\] In words, if we look at the samples of the binormal random variable \(\left(X_{1}, X_{2}\right)\) and focus on the behavior of \(X_{1}\) only (i.e. disregard \(X_{2}\) ), then we will observe that \(X_{1}\) is normally distributed. Similarly, the marginal density of \(X_{2}\) is \[f_{X_{2}}\left(x_{2}\right) \equiv \int_{x_{1}=-\infty}^{\infty} f_{X_{1}, X_{2}}\left(x_{1}, x_{2}\right) d x_{1}=f^{\mathrm{normal}}\left(x_{2} ; \mu_{2}, \sigma_{2}\right) .\] This rather surprising result is one of the properties of the binormal distribution, which in fact extends to higher-dimensional multivariate normal distributions.

Proof. For convenience, we will first rewrite the probability density function as \[f_{X_{1}, X_{2}}\left(x_{1}, x_{2}\right)=\frac{1}{2 \pi \sigma_{1} \sigma_{2} \sqrt{1-\rho^{2}}} \exp \left(-\frac{1}{2} q\left(x_{1}, x_{2}\right)\right)\] where the quadratic term is \[q\left(x_{1}, x_{2}\right)=\frac{1}{1-\rho^{2}}\left[\frac{\left(x_{1}-\mu_{1}\right)^{2}}{\sigma_{1}^{2}}+\frac{\left(x_{2}-\mu_{2}\right)^{2}}{\sigma_{2}^{2}}-\frac{2 \rho\left(x_{1}-\mu_{1}\right)\left(x_{2}-\mu_{2}\right)}{\sigma_{1} \sigma_{2}}\right] .\] We can manipulate the quadratic term to yield \[\begin{aligned} q\left(x_{1}, x_{2}\right) &=\frac{\left(x_{1}-\mu_{1}\right)^{2}}{\sigma_{1}^{2}}+\frac{1}{1-\rho^{2}}\left[\frac{\rho^{2}\left(x_{1}-\mu_{1}\right)^{2}}{\sigma_{1}^{2}}+\frac{\left(x_{2}-\mu_{2}\right)^{2}}{\sigma_{2}^{2}}-\frac{2 \rho\left(x_{1}-\mu_{1}\right)\left(x_{2}-\mu_{2}\right)}{\sigma_{1} \sigma_{2}}\right] \\ &=\frac{\left(x_{1}-\mu_{1}\right)^{2}}{\sigma_{1}^{2}}+\frac{1}{1-\rho^{2}}\left[\frac{\rho\left(x_{1}-\mu_{1}\right)}{\sigma_{1}}-\frac{x_{2}-\mu_{2}}{\sigma_{2}}\right]^{2} \\ &=\frac{\left(x_{1}-\mu_{1}\right)^{2}}{\sigma_{1}^{2}}+\frac{1}{\sigma_{2}^{2}\left(1-\rho^{2}\right)}\left[x_{2}-\left(\mu_{2}+\rho \frac{\sigma_{2}}{\sigma_{1}}\left(x_{1}-\mu_{1}\right)\right)\right]^{2} \end{aligned}\] Substitution of the expression into the probability density function yields \[\begin{aligned} f_{X_{1}, X_{2}}\left(x_{1}, x_{2}\right)=& \frac{1}{2 \pi \sigma_{1} \sigma_{2} \sqrt{1-\rho^{2}}} \exp \left(-\frac{1}{2} q\left(x_{1}, x_{2}\right)\right) \\ =& \frac{1}{\sqrt{2 \pi} \sigma_{1}} \exp \left(-\frac{1}{2} \frac{\left(x_{1}-\mu_{1}\right)^{2}}{\sigma_{1}^{2}}\right) \\ & \times \frac{1}{\sqrt{2 \pi} \sigma_{2} \sqrt{1-\rho^{2}}} \exp \left(-\frac{1}{2} \frac{\left(x_{2}-\left(\mu_{2}+\rho\left(\sigma_{2} / \sigma_{1}\right)\left(x_{1}-\mu_{1}\right)\right)\right)^{2}}{\sigma_{2}^{2}\left(1-\rho^{2}\right)}\right) \\ =& f^{\mathrm{normal}}\left(x_{1} ; \mu_{1}, \sigma_{1}^{2}\right) \cdot f^{\text {normal }}\left(x_{2} ; \mu_{2}+\rho \frac{\sigma_{2}}{\sigma_{1}}\left(x_{1}-\mu_{1}\right), \sigma_{2}^{2}\left(1-\rho^{2}\right)\right) \end{aligned}\] Note that we have expressed the joint probability as the product of two univariate Gaussian functions. We caution that this does not imply independence, because the mean of the second distribution is dependent on the value of \(x_{1}\). Applying the definition of marginal density of \(X_{1}\) and integrating out the \(x_{2}\) term, we obtain \[\begin{aligned} f_{X_{1}}\left(x_{1}\right) &=\int_{x_{2}=-\infty}^{\infty} f_{X_{1}, X_{2}}\left(x_{1}, x_{2}\right) d x_{2} \\ &=\int_{x_{2}=-\infty}^{\infty} f^{\mathrm{normal}}\left(x_{1} ; \mu_{1}, \sigma_{1}^{2}\right) \cdot f^{\mathrm{normal}}\left(x_{2} ; \mu_{2}+\rho \frac{\sigma_{2}}{\sigma_{1}}\left(x_{1}-\mu_{1}\right), \sigma_{2}^{2}\left(1-\rho^{2}\right)\right) d x_{2} \\ &=f^{\mathrm{normal}}\left(x_{1} ; \mu_{1}, \sigma_{1}^{2}\right) \cdot \int_{x_{2}=-\infty}^{\infty} f^{\mathrm{normal}}\left(x_{2} ; \mu_{2}+\rho \frac{\sigma_{2}}{\sigma_{1}}\left(x_{1}-\mu_{1}\right), \sigma_{2}^{2}\left(1-\rho^{2}\right)\right) d x_{2} \\ &=f^{\mathrm{normal}}\left(x_{1} ; \mu_{1}, \sigma_{1}^{2}\right) . \end{aligned}\] The integral of the second function evaluates to unity because it is a probability density function. Thus, the marginal density of \(X_{1}\) is simply the univariate normal distribution with parameters \(\mu_{1}\) and \(\sigma_{1}\). The proof for the marginal density of \(X_{2}\) is identical due to the symmetry of the joint probability density function.

Figure \(9.16\) shows the marginal densities \(f_{X_{1}}\) and \(f_{X_{2}}\) along with the \(\sigma=1\) - and \(\sigma=2\)-contours of the joint probability density. The dots superimposed on the joint density are 500 realizations of \(\left(X_{1}, X_{2}\right)\). The histogram on the top summarizes the relative frequency of \(X_{1}\) taking on a value within the bins for the 500 realizations. Similarly, the histogram on the right summarizes relative frequency of the values that \(X_{2}\) takes. The histograms closely matches the theoretical marginal distributions for \(\mathcal{N}\left(\mu_{1}, \sigma_{1}^{2}\right)\) and \(\mathcal{N}\left(\mu_{2}, \sigma_{2}^{2}\right)\). In particular, we note that the marginal densities are independent of the correlation coefficient \(\rho\).

Figure 9.16: Illustration of marginal densities for a bivariate normal distribution ( \(\mu_{1}=\mu_{2}=0\), \(\left.\sigma_{1}=3, \sigma_{2}=2, \rho=3 / 4\right)\).

Having studied the marginal densities of the bivariate normal distribution, let us now consider conditional probabilities. Combining the definition of conditional density and the expression for the joint and marginal densities, we obtain \[\begin{aligned} f_{X_{1} \mid X_{2}}\left(x_{1} \mid x_{2}\right) &=\frac{f_{X_{1}, X_{2}}\left(x_{1}, x_{2}\right)}{f_{X_{2}}\left(x_{2}\right)}=f^{\text {normal }}\left(x_{1} ; \mu_{1}+\rho \frac{\sigma_{1}}{\sigma_{2}}\left(x_{2}-\mu_{2}\right),\left(1-\rho^{2}\right) \sigma_{1}^{2}\right) \\ &=\frac{1}{\sqrt{2 \pi} \sigma_{1} \sqrt{1-\rho^{2}}} \exp \left(-\frac{1}{2} \frac{\left(x_{1}-\left(\mu_{1}+\rho\left(\sigma_{1} / \sigma_{2}\right) x_{2}\right)\right)^{2}}{\sigma_{1}^{2}\left(1-\rho^{2}\right)}\right) . \end{aligned}\] Similarly, the conditional density of \(X_{2}\) given \(X_{1}\) is \[f_{X_{2} \mid X_{1}}\left(x_{2}, x_{1}\right)=\frac{f_{X_{1}, X_{2}}\left(x_{1}, x_{2}\right)}{f_{X_{1}}\left(x_{1}\right)}=f^{\text {normal }}\left(x_{2} ; \mu_{2}+\rho \frac{\sigma_{2}}{\sigma_{1}}\left(x_{1}-\mu_{1}\right),\left(1-\rho^{2}\right) \sigma_{2}^{2}\right) .\] Note that unlike the marginal probabilities, the conditional probabilities are function of the correlation coefficient \(\rho\). In particular, the standard deviation of the conditional distribution (i.e. its spread about its mean) decreases with \(|\rho|\) and vanishes as \(\rho \rightarrow \pm 1\). In words, if the correlation is high, then we can deduce with a high probability the state of \(X_{1}\) given the value that \(X_{2}\) takes. We also note that the positive correlation \((\rho>0)\) results in the mean of the conditional probability \(X_{1} \mid X_{2}\) shifted in the direction of \(X_{2}\). That is, if \(X_{2}\) takes on a value higher than its mean, then it is more likely than not that \(X_{1}\) takes on a value higher than its mean.

Proof. Starting with the definition of conditional probability and substituting the joint and marginal probability density functions, \[\begin{aligned} f_{X_{1} \mid X_{2}}\left(x_{1} \mid x_{2}\right)=& \frac{f_{X_{1}, X_{2}}\left(x_{1}, x_{2}\right)}{f_{X_{2}}\left(x_{2}\right)} \\ =& \frac{1}{2 \pi \sigma_{1} \sigma_{2} \sqrt{1-\rho^{2}}} \exp \left\{-\frac{1}{2\left(1-\rho^{2}\right)}\left[\frac{\left(x_{1}-\mu_{1}\right)^{2}}{\sigma_{1}^{2}}+\frac{\left(x_{2}-\mu_{2}\right)^{2}}{\sigma_{2}^{2}}-\frac{2 \rho\left(x_{1}-\mu_{1}\right)\left(x_{2}-\mu_{2}\right)}{\sigma_{1} \sigma_{2}}\right]\right\} \\ & \times \frac{\sqrt{2 \pi} \sigma_{2}}{1} \exp \left(\frac{1}{2} \frac{\left(x_{2}-\mu_{2}\right)^{2}}{\sigma_{2}^{2}}\right) \\ =& \frac{1}{\sqrt{2 \pi} \sigma_{1} \sqrt{1-\rho^{2}}} \exp \left\{-\frac{1}{2} s\left(x_{1}, x_{2}\right)\right\} \end{aligned}\] where \[s\left(x_{1}, x_{2}\right)=\frac{1}{1-\rho^{2}}\left[\frac{\left(x_{1}-\mu_{1}\right)^{2}}{\sigma_{1}^{2}}+\frac{\left(x_{2}-\mu_{2}\right)^{2}}{\sigma_{2}^{2}}-\frac{2 \rho\left(x_{1}-\mu_{1}\right)\left(x_{2}-\mu_{2}\right)}{\sigma_{1} \sigma_{2}}-\left(1-\rho^{2}\right) \frac{\left(x_{2}-\mu_{2}\right)^{2}}{\sigma_{2}^{2}}\right] .\] Rearrangement of the quadratic term \(s\left(x_{1}, x_{2}\right)\) yields \[\begin{aligned} s\left(x_{1}, x_{2}\right) &=\frac{1}{1-\rho^{2}}\left[\frac{\left(x_{1}-\mu_{1}\right)^{2}}{\sigma_{1}^{2}}-\frac{2 \rho\left(x_{1}-\mu_{1}\right)\left(x_{2}-\mu_{2}\right)}{\sigma_{1} \sigma_{2}}+\frac{\rho^{2}\left(x_{2}-\mu_{2}\right)^{2}}{\sigma_{2}^{2}}\right] \\ &=\frac{1}{1-\rho^{2}}\left[\frac{x_{1}-\mu_{1}}{\sigma_{1}}-\frac{\rho\left(x_{2}-\mu_{2}\right)}{\sigma_{2}}\right]^{2} \\ &=\frac{1}{\sigma_{1}^{2}\left(1-\rho^{2}\right)}\left[x_{1}-\left(\mu_{1}+\rho \frac{\sigma_{1}}{\sigma_{2}}\left(x_{2}-\mu_{2}\right)\right)\right]^{2} . \end{aligned}\] Substitution of the quadratic term into the conditional probability density function yields \[\begin{aligned} f_{X_{1} \mid X_{2}}\left(x_{1} \mid x_{2}\right) &=\frac{1}{\sqrt{2 \pi} \sigma_{1} \sqrt{1-\rho^{2}}} \exp \left\{-\frac{1}{2} \frac{1}{\sigma_{1}^{2}\left(1-\rho^{2}\right)}\left[x_{1}-\left(\mu_{1}+\rho \frac{\sigma_{1}}{\sigma_{2}}\left(x_{2}-\mu_{2}\right)\right)\right]^{2}\right\} \\ &=f^{\text {normal }}\left(x_{1} ; \mu_{1}+\rho \frac{\sigma_{1}}{\sigma_{2}}\left(x_{2}-\mu_{2}\right),\left(1-\rho^{2}\right) \sigma_{1}^{2}\right) \end{aligned}\] where the last equality follows from recognizing the univariate normal probability distribution function.

Figure \(9.17\) shows the conditional densities \(f_{X_{1} \mid X_{2}}\left(x_{1} \mid x_{2}=-2\right)\) and \(f_{X_{2} \mid X_{1}}\left(x_{2} \mid x_{1}=3\right)\) for a bivariate normal distribution \(\left(\mu_{1}=\mu_{2}=0, \sigma_{1}=3, \sigma_{2}=2, \rho=3 / 4\right)\). The histograms are constructed by counting the relative frequency of occurrence for those realizations that falls near the conditional value of \(x_{2}=-2\) and \(x_{1}=3\), respectively. Clearly, the mean of the conditional probability densities are shifted relative to the respective marginal densities. As \(\rho=3 / 4>0\) and \(x_{2}-\mu_{2}=-2<0\), the mean for \(X_{1} \mid X_{2}\) is shifted in the negative direction. Conversely, \(\rho>0\) and \(x_{1}-\mu_{1}=3>0\) shifts the mean for \(X_{2} \mid X_{1}\) in the positive direction. We also note that the conditional probability densities are tighter than the respective marginal densities; due to the relative strong correlation of \(\rho=3 / 4\), we have a better knowledge of the one state when we know the value of the other state.

Finally, to solidify the idea of correlation, let us consider the \(1 \sigma\)-contour for bivariate normal distributions with several different values of \(\rho\), shown in Figure 9.18. A stronger (positive) correlation implies that there is a high chance that a positive value \(\frac{\overline{o f} x_{2}}{}\)

Figure 9.17: Illustration of conditional densities \(f_{X_{1} \mid X_{2}}\left(x_{1} \mid x_{2}=-2\right)\) and \(f_{X_{2} \mid X_{1}}\left(x_{2} \mid x_{1}=3\right)\) for a bivariate normal distribution ( \(\mu_{1}=\mu_{2}=0, \sigma_{1}=3, \sigma_{2}=2, \rho=3 / 4\) ).

\(x_{1}\). Conversely, a strong negative correlation implies that there is a high chance a positive value of \(x_{2}\) implies a negative value of \(x_{1}\). Zero correlation - which implies independence for normal distributions - means that we gain no additional information about the value that \(X_{1}\) takes on by knowing the value of \(X_{2}\); thus, the contour of equal probability density is not tilted.

Figure 9.18: Bivariate normal distributions with \(\mu_{1}=\mu_{2}=0, \sigma_{1}=3, \sigma_{2}=2\), and several values of \(\rho\).