4.3: Singular Value Decomposition
- Page ID
- 24250
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Before we discuss the singular value decomposition of matrices, we begin with some matrix facts and definitions.
Some Matrix Facts:
- A matrix \(U \in C^{n \times n}\) is unitary if \(U^{\prime}U = UU^{\prime}= I\). Here, as in Matlab, the superscript \(^{\prime}\) denotes the (entry-by-entry) complex conjugate of the transpose, which is also called the Hermitian transpose or conjugate transpose.
- A matrix \(U \in R^{n \times n}\) is orthogonal if \(U^{T}U = UU^{T}= I\), where the superscript \(^{T}\) denotes the transpose.
- Property: If \(U\) is unitary, then \(\|U x\|_{2}=\|x\|_{2}\).
- If \(S = S^{\prime}\) (i.e. \(S\) equals its Hermitian transpose, in which case we say \(S\) is Hermitian), then there exists a unitary matrix such that \(U^{\prime}SU= {[diagonal matrix].}^{1}\)
- For any matrix \(A\), both \(A^{\prime}A\) and \(AA^{\prime}\) are Hermitian, and thus can always be diagonalized by unitary matrices.
- For any matrix \(A\), the eigenvalues of \(A^{\prime}A\) and \(AA^{\prime}\) are always real and non-negative (proved easily by contradiction).
Theorem 4.1 (Singular Value Decomposition, or SVD)
Given any matrix \(A \in C^{n \times n}\), \(A\) can be written as
\[A=\stackrel{m \times m}{U} \ \stackrel{m \times n}{\Sigma} \ \stackrel{n \times n}{V^{\prime}}, \ \tag{4.14}\]
where \(U^{\prime}U =I\), \(V^{\prime}V =I\)
\[\Sigma=\left[\begin{array}{cccc}
\sigma_{1} & & & \\
& \ddots & & 0 \\
& & \sigma_{r} & \\
\hline & & & \\
& 0 & & 0
\end{array}\right]\ \tag{4.15}\]
and \($\sigma_{i}=\sqrt{i \text { th nonzero eigenvalue of } A^{\prime} A}$\). The \($\sigma_{i}\) are termed the singular values of \(A\), and are arranged in order of descending magnitude, i.e.,
\[\sigma_{1} \geq \sigma_{2} \geq \cdots \geq \sigma_{r}>0\nonumber\]
- Proof
-
We will prove this theorem for the case \(rank(A) = m\); the general case involves very little more than what is required for this case. The matrix \(AA^{\prime}\) is Hermitian, and it can therefore be diagonalized by a unitary matrix \(U \in C^{m \times m}\, so that
\[U \Lambda_{1} U^{\prime}=A A^{\prime}.\nonumber \]
Note that \(\Lambda_{1}=\operatorname{diag}\left(\lambda_{1}, \lambda_{2}, \ldots, \lambda_{m}\right)\) has real positive diagonal entries \(\Lambda_{i}\) due to the fact that \(AA^{\prime}\) is positive definite. We can write \(\Lambda_{1}=\Sigma_{1}^{2}=\operatorname{diag}\left(\sigma_{1}^{2}, \sigma_{2}^{2}, \ldots, \sigma_{m}^{2}\right)\). Define \(V_{1}^{\prime} \in C^{m \times n}\) has orthonormal rows as can be seen from the following calculation: \(V_{1}^{\prime} V_{1}=\Sigma_{1}^{-1} U^{\prime} A A^{\prime} U \Sigma_{1}^{-1}=I\). Choose the matrix \(V_{2}^{\prime}\) in such a way that
is in \(C^{n \times n}\)n and unitary. Define the \(m \times n\) matrix \(\Sigma=\left[\Sigma_{1} \mid 0\right].\) This implies that
\[\Sigma V^{\prime}=\Sigma_{1} V_{1}^{\prime}=U^{\prime} A\nonumber\]
In other words we have \(A = U \Sigma V^{\prime}\)
Example 4.2
For the matrix A given at the beginning of this lecture, the SVD - computed easily in Matlab by writing \([u, s, v] = svd(A)\) - is
\[A=\left(\begin{array}{cc}
.7068 & .7075 \\
.7075 & .-7068
\end{array}\right)\left(\begin{array}{cc}
200.1 & 0 \\
0 & 0.1
\end{array}\right)\left(\begin{array}{cc}
.7075 & .7068 \\
-.7068 & .7075
\end{array}\right) \ \tag{4.16}\]
Solution
Observations:
i) \[\begin{aligned}
A A^{\prime} &=U \Sigma V^{\prime} V \Sigma^{T} U^{\prime} \\
&=U \Sigma \Sigma^{T} U^{\prime}
\end{aligned}\nonumber\]
\[=U\left[\begin{array}{ccc|c}
\sigma_{1}^{2} & & & \\
& \ddots & & 0 \\
& & \sigma_{r}^{2} & \\
\hline & & & \\
& 0 & & 0
\end{array}\right] U^{\prime}\ \tag{4.17}\]
which tells us U diagonalizes \(AA^{\prime}\);
ii) \[\begin{aligned}
A^{\prime} A &=V \Sigma^{T} U^{\prime} U \Sigma V^{\prime} \\
&=V \Sigma^{T} \Sigma V^{\prime}
\end{aligned}\nonumber\]
\[=V\left[\begin{array}{ccc|c}
\sigma_{1}^{2} & & & \\
& \ddots & & 0 \\
& & \sigma_{r}^{2} & \\
\hline & & & \\
& 0 & & 0
\end{array}\right] V^{\prime}\ \tag{4.18}\]
which tells us V diagonalizes \(AA^{\prime}\);
iii) If \(U\) and \(V\) are expressed in terms of their columns, i.e.,
\[\begin{array}{l}
U=\left[\begin{array}{llll}
u_{1} & u_{2} & \dots & u_{m}
\end{array}\right] \end{array}\nonumber\]
and
\[\begin{array}{l}
V=\left[\begin{array}{llll}
v_{1} & v_{2} & \dots & v_{n}
\end{array}\right] \end{array}\nonumber\]
then
\[A=\sum_{i=1}^{r} \sigma_{i} u_{i} v_{i}^{\prime}\ \tag{4.19}\]
which is another way to write the SVD. The \(u_{i}\) are termed the left singular vectors of \(A\), and the \(v_{i}\) are its right singular vectors. From this we see that we can alternately interpret \(Ax\) as
\[A x=\sum_{i=1}^{r} \sigma_{i} u_{i} \underbrace{\left(v_{i}^{\prime} x\right)}_{\text {projection }},\ \tag{4.20}\]
which is a weighted sum of the \(u_{i}\), where the weights are the products of the singular values and the projections of \(x\) onto the \(v_{i}\).
Observation (iii) tells us that \(\mathcal{R} a(A)=\operatorname{span}\left\{u_{1}, \ldots u_{r}\right\}\) (because \(A x=\sum_{i=1}^{r} c_{i} u_{i}\) - where the \(c_{i}\) are scalar weights). Since the columns of \(U\) are independent, \(\operatorname{dim}\mathcal{R} a(A)=r=rank(A)\), and \(\left\{u_{1}, \ldots u_{r}\right\}\) constitute a basis for the range space of \(A\). The null space of \(A\) is given by \(\left\{v_{r+1}, \ldots v_{n}\right\}\. To see this:
\[\begin{aligned}
U \Sigma V^{\prime} x=0 & \Longleftrightarrow \Sigma V^{\prime} x=0 \\
\Longleftrightarrow &\left[\begin{array}{c}
\sigma_{1} v_{1}^{\prime} x \\
\vdots \\
\sigma_{r} v_{r}^{\prime} x
\end{array}\right]=0 \\
\Longleftrightarrow & v_{i}^{\prime} x=0, \quad i=1, \ldots, r \\
\Longleftrightarrow & x \in \operatorname{span}\left\{v_{r+1}, \ldots, v_{n}\right\}
\end{aligned}\nonumber\]
Example 4.3
One application of singular value decomposition is to the solution of a system of algebraic equations. Suppose \(A\) is an \(m \times n\) complex matrix and \(b\) is a vector in \(C^{m}\). Assume that the rank of \(A\) is equal to \(k\), with \(k < m\). We are looking for a solution of the linear system \(Ax = b\). By applying the singular value decomposition procedure to \(A\), we get
\[\begin{aligned}
A &=U \Sigma V^{\prime} \\
&=U\left[\begin{array}{c|c}
\Sigma_{1} & 0 \\
0 & 0
\end{array}\right] V^{\prime}
\end{aligned}\nonumber\]
where \(\Sigma_{1}\) is a \(k \times k\) non-singular diagonal matrix. We will express the unitary matrices \(U\) and \(V\) columnwise as
\[\begin{array}{l}
U=\left[\begin{array}{llll}
u_{1} & u_{2} & \dots & u_{m}
\end{array}\right] \\
V=\left[\begin{array}{llll}
v_{1} & v_{2} & \dots & v_{n}
\end{array}\right]
\end{array}\nonumber\]
Solution
A necessary and sufficient condition for the solvability of this system of equations is that \(u^{\prime}_{i}b=0\) for all \(i\) satisfying \(k < i \leq m\). Otherwise, the system of equations is inconsistent. This condition means that the vector \(b\) must be orthogonal to the last \(m - k\) columns of \(U\). Therefore the system of linear equations can be written as
\[\left[\begin{array}{c|l}
\Sigma_{1} & 0 \\
\hline & \\
0 & 0
\end{array}\right] V^{\prime} x=U^{\prime} b\nonumber\]
\[\left[\begin{array}{c|l}
\Sigma_{1} & 0 \\
\hline & \\
0 & 0
\end{array}\right] V^{\prime} x=\left[\begin{array}{c}
u_{1}^{\prime} b \\
u_{2}^{\prime} b \\
\vdots \\
\vdots \\
u_{m}^{\prime} b
\end{array}\right]=\left[\begin{array}{c}
u_{1}^{\prime} b \\
\vdots \\
u_{k}^{\prime} b \\
0 \\
\vdots \\
0
\end{array}\right]\nonumber\]
Using the above equation and the invertibility of \(\Sigma_{1}\), we can rewrite the system of equations as
\[\left[\begin{array}{c}
v_{1}^{\prime} \\
v_{2}^{\prime} \\
\vdots \\
v_{k}^{\prime}
\end{array}\right] x=\left[\begin{array}{c}
\frac{1}{\sigma_{1}} u_{1}^{\prime} b \\
\frac{1}{\sigma_{2}} u_{2}^{\prime} b \\
\cdots \\
\frac{1}{\sigma_{k}} u_{k}^{\prime} b
\end{array}\right]\nonumber\]
By using the fact that
\[\left[\begin{array}{c}
v_{1}^{\prime} \\
v_{2}^{\prime} \\
\vdots \\
v_{k}^{\prime}
\end{array}\right]\left[\begin{array}{llll}
v_{1} & v_{2} & \ldots & v_{k}
\end{array}\right]=I\nonumber\]
we obtain a solution of the form
\[x=\sum_{i=1}^{k} \frac{1}{\sigma_{i}} u_{i}^{\prime} b v_{i}\nonumber\]
From the observations that were made earlier, we know that the vectors \(v_{k+1}, v_{k+2}, \dots, v_{n}\) span the kernel of \(A\), and therefore a general solution of the system of linear equations is given by
\[x=\sum_{i=1}^{k} \frac{1}{\sigma_{i}}\left(u_{i}^{\prime} b\right) v_{i}+\sum_{i=k+1}^{n} \beta_{i} v_{i}\nonumber\]
where the coefficients \(\beta_{i}\), with \(i\) in the interval \(k+1 \leq i \leq n\), are arbitrary complex numbers.