16.2: Matrix Operations

Last updated
Save as PDF

Page ID: 55680

Masayuki Yano, James Douglass Penn, George Konidaris, & Anthony T Patera
Massachusetts Institute of Technology via MIT OpenCourseWare

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

Interpretation of Matrices

Recall that a matrix \(A \in \mathbb{R}^{m \times n}\) consists of \(m\) rows and \(n\) columns for the total of \(m \cdot n\) entries, \[A=\left(\begin{array}{cccc} A_{11} & A_{12} & \cdots & A_{1 n} \\ A_{21} & A_{22} & \cdots & A_{2 n} \\ \vdots & \vdots & \ddots & \vdots \\ A_{m 1} & A_{m 2} & \cdots & A_{m n} \end{array}\right)\] This matrix can be interpreted in a column-centric manner as a set of \(n\) column \(m\)-vectors. Alternatively, the matrix can be interpreted in a row-centric manner as a set of \(m\) row \(n\)-vectors. Each of these interpretations is useful for understanding matrix operations, which is covered next.

Matrix Operations

The first matrix operation we consider is multiplication of a matrix \(A \in \mathbb{R}^{m_{1} \times n_{1}}\) by a scalar \(\alpha \in \mathbb{R}\). The operation yields \[B=\alpha A,\] where each entry of \(B \in \mathbb{R}^{m_{1} \times n_{1}}\) is given by \[B_{i j}=\alpha A_{i j}, \quad i=1, \ldots, m_{1}, j=1, \ldots, n_{1} .\] Similar to the multiplication of a vector by a scalar, the multiplication of a matrix by a scalar scales each entry of the matrix.

The second operation we consider is addition of two matrices \(A \in \mathbb{R}^{m_{1} \times n_{1}}\) and \(B \in \mathbb{R}^{m_{2} \times n_{2}}\). The addition yields \[C=A+B\] where each entry of \(C \in \mathbb{R}^{m_{1} \times n_{1}}\) is given by \[C_{i j}=A_{i j}+B_{i j}, \quad i=1, \ldots, m_{1}, j=1, \ldots, n_{1} .\] In order for addition of two matrices to make sense, the matrices must have the same dimensions, \(m_{1}\) and \(n_{1}\).

We can combine the scalar scaling and addition operation. Let \(A \in \mathbb{R}^{m_{1} \times n_{1}}, B \in \mathbb{R}^{m_{1} \times n_{1}}\), and \(\alpha \in \mathbb{R}\). Then, the operation \[C=A+\alpha B\] yields a matrix \(C \in \mathbb{R}^{m_{1} \times n_{1}}\) whose entries are given by \[C_{i j}=A_{i j}+\alpha B_{i j}, \quad i=1, \ldots, m_{1}, j=1, \ldots, n_{1} .\] Note that the scalar-matrix multiplication and matrix-matrix addition operations treat the matrices as arrays of numbers, operating entry by entry. This is unlike the matrix-matrix prodcut, which is introduced next after an example of matrix scaling and addition.

Example 16.2.1 matrix scaling and addition

Consider the following matrices and scalar, \[A=\left(\begin{array}{cc} 1 & \sqrt{3} \\ -4 & 9 \\ \pi & -3 \end{array}\right), \quad B=\left(\begin{array}{cc} 0 & 2 \\ 2 & -3 \\ \pi & -4 \end{array}\right), \quad \text { and } \quad \alpha=2\] Then, \[C=A+\alpha B=\left(\begin{array}{cc} 1 & \sqrt{3} \\ -4 & 9 \\ \pi & -3 \end{array}\right)+2 \cdot\left(\begin{array}{cc} 0 & 2 \\ 2 & -3 \\ \pi & -4 \end{array}\right)=\left(\begin{array}{cc} 1 & \sqrt{3}+4 \\ 0 & 3 \\ 3 \pi & -11 \end{array}\right)\]

Matrix-Matrix Product

Let us consider two matrices \(A \in \mathbb{R}^{m_{1} \times n_{1}}\) and \(B \in \mathbb{R}^{m_{2} \times n_{2}}\) with \(n_{1}=m_{2}\). The matrix-matrix product of the matrices results in \[C=A B\] with \[C_{i j}=\sum_{k=1}^{n_{1}} A_{i k} B_{k j}, \quad i=1, \ldots, m_{1}, j=1, \ldots, n_{2} .\] Because the summation applies to the second index of \(A\) and the first index of \(B\), the number of columns of \(A\) must match the number of rows of \(B: n_{1}=m_{2}\) must be true. Let us consider a few examples.

Example 16.2.2 matrix-matrix product

Let us consider matrices \(A \in \mathbb{R}^{3 \times 2}\) and \(B \in \mathbb{R}^{2 \times 3}\) with \[A=\left(\begin{array}{cc} 1 & 3 \\ -4 & 9 \\ 0 & -3 \end{array}\right) \quad \text { and } B=\left(\begin{array}{ccc} 2 & 3 & -5 \\ 1 & 0 & -1 \end{array}\right)\] The matrix-matrix product yields \[C=A B=\left(\begin{array}{cc} 1 & 3 \\ -4 & 9 \\ 0 & -3 \end{array}\right)\left(\begin{array}{ccc} 2 & 3 & -5 \\ 1 & 0 & -1 \end{array}\right)=\left(\begin{array}{ccc} 5 & 3 & -8 \\ 1 & -12 & 11 \\ -3 & 0 & 3 \end{array}\right)\] where each entry is calculated as \[\begin{aligned} C_{11} &=\sum_{k=1}^{2} A_{1 k} B_{k 1}=A_{11} B_{11}+A_{12} B_{21}=1 \cdot 2+3 \cdot 1=5 \\ C_{12} &=\sum_{k=1}^{2} A_{1 k} B_{k 2}=A_{11} B_{12}+A_{12} B_{22}=1 \cdot 3+3 \cdot 0=3 \\ C_{13} &=\sum_{k=1}^{2} A_{1 k} B_{k 3}=A_{11} B_{13}+A_{12} B_{23}=1 \cdot-5+3 \cdot(-1)=-8 \\ C_{21} &=\sum_{k=1}^{2} A_{2 k} B_{k 1}=A_{21} B_{11}+A_{22} B_{21}=-4 \cdot 2+9 \cdot 1=1 \\ & \vdots \\ C_{33} &=\sum_{k=1}^{2} A_{3 k} B_{k 3}=A_{31} B_{13}+A_{32} B_{23}=0 \cdot-5+(-3) \cdot(-1)=3 \end{aligned}\] Note that because \(A \in \mathbb{R}^{3 \times 2}\) and \(B \in \mathbb{R}^{2 \times 3}, C \in \mathbb{R}^{3 \times 3}\).

This is very different from \[D=B A=\left(\begin{array}{ccc} 2 & 3 & -5 \\ 1 & 0 & -1 \end{array}\right)\left(\begin{array}{cc} 1 & 3 \\ -4 & 9 \\ 0 & -3 \end{array}\right)=\left(\begin{array}{cc} -10 & 48 \\ 1 & 6 \end{array}\right)\] where each entry is calculated as \[\begin{aligned} D_{11} &=\sum_{k=1}^{3} A_{1 k} B_{k 1}=B_{11} A_{11}+B_{12} A_{21}+B_{13} A_{31}=2 \cdot 1+3 \cdot(-4)+(-5) \cdot 0=-10 \\ & \vdots \\ D_{22} &=\sum_{k=1}^{3} A_{2 k} B_{k 2}=B_{21} A_{12}+B_{22} A_{22}+B_{23} A_{32}=1 \cdot 3+0 \cdot 9+(-1) \cdot(-3)=6 . \end{aligned}\] Note that because \(B \in \mathbb{R}^{2 \times 3}\) and \(A \in \mathbb{R}^{3 \times 2}, D \in \mathbb{R}^{2 \times 2}\). Clearly, \(C=A B \neq B A=D ; C\) and \(D\) in fact have different dimensions. Thus, matrix-matrix product is not commutative in general, even if both \(A B\) and \(B A\) make sense.

Example 16.2.3 inner product as matrix-matrix product

The inner product of two vectors can be considered as a special case of matrix-matrix product. Let \[v=\left(\begin{array}{l} 1 \\ 3 \\ 6 \end{array}\right) \quad \text { and } \quad w=\left(\begin{array}{c} -2 \\ 0 \\ 4 \end{array}\right)\] We have \(v, w \in \mathbb{R}^{3}\left(=\mathbb{R}^{3 \times 1}\right)\). Taking the transpose, we have \(v^{T} \in \mathbb{R}^{1 \times 3}\). Noting that the second dimension of \(v^{\mathrm{T}}\) and the first dimension of \(w\) match, we can perform matrix-matrix product, \[\beta=v^{\mathrm{T}} w=\left(\begin{array}{lll} 1 & 3 & 6 \end{array}\right)\left(\begin{array}{c} -2 \\ 0 \\ 4 \end{array}\right)=1 \cdot(-2)+3 \cdot 0+6 \cdot 4=22\]

Example 16.2.4 outer product

The outer product of two vectors is yet another special case of matrix-matrix product. The outer product \(B\) of two vectors \(v \in \mathbb{R}^{m}\) and \(w \in \mathbb{R}^{m}\) is defined as \[B=v w^{\mathrm{T}} .\] Because \(v \in \mathbb{R}^{m \times 1}\) and \(w^{\mathrm{T}} \in \mathbb{R}^{1 \times m}\), the matrix-matrix product \(v w^{\mathrm{T}}\) is well-defined and yields as \(m \times m\) matrix.

As in the previous example, let \[v=\left(\begin{array}{l} 1 \\ 3 \\ 6 \end{array}\right) \quad \text { and } \quad w=\left(\begin{array}{c} -2 \\ 0 \\ 4 \end{array}\right) .\] The outer product of two vectors is given by \[w v^{\mathrm{T}}=\left(\begin{array}{c} -2 \\ 0 \\ 4 \end{array}\right)\left(\begin{array}{ccc} 1 & 3 & 6 \end{array}\right)=\left(\begin{array}{ccc} -2 & -6 & -12 \\ 0 & 0 & 0 \\ 4 & 12 & 24 \end{array}\right) .\] Clearly, \(\beta=v^{\mathrm{T}} w \neq w v^{\mathrm{T}}=B\), as they even have different dimensions. In the above example, we saw that \(A B \neq B A\) in general. In fact, \(A B\) might not even be allowed even if \(B A\) is allowed (consider \(A \in \mathbb{R}^{2 \times 1}\) and \(B \in \mathbb{R}^{3 \times 2}\) ). However, although the matrix-matrix product is not commutative in general, the matrix-matrix product is associative, i.e. \[A B C=A(B C)=(A B) C .\] Moreover, the matrix-matrix product is also distributive, i.e. \[(A+B) C=A C+B C .\]

Proof

The associative and distributive properties of matrix-matrix product is readily proven from its definition. For associativity, we consider \(i j\)-entry of the \(m_{1} \times n_{3}\) matrix \(A(B C)\), i.e. \[\begin{aligned} (A(B C))_{i j} &=\sum_{k=1}^{n_{1}} A_{i k}(B C)_{k j}=\sum_{k=1}^{n_{1}} A_{i k}\left(\sum_{l=1}^{n_{2}} B_{k l} C_{l j}\right)=\sum_{k=1}^{n_{1}} \sum_{l=1}^{n_{2}} A_{i k} B_{k l} C_{l j}=\sum_{l=1}^{n_{2}} \sum_{k=1}^{n_{1}} A_{i k} B_{k l} C_{l j} \\ &=\sum_{l=1}^{n_{2}}\left(\sum_{k=1}^{n_{1}} A_{i k} B_{k l}\right) C_{l j}=\sum_{l=1}^{n_{2}}(A B)_{i l} C_{l j}=((A B) C)_{i j}, \quad \forall i, j . \end{aligned}\] Since the equality \((A(B C))_{i j}=((A B) C)_{i j}\) holds for all entries, we have \(A(B C)=(A B) C\).

The distributive property can also be proven directly. The \(i j\)-entry of \((A+B) C\) can be expressed as \[\begin{aligned} ((A+B) C)_{i j} &=\sum_{k=1}^{n_{1}}(A+B)_{i k} C_{k j}=\sum_{k=1}^{n_{1}}\left(A_{i k}+B_{i k}\right) C_{k j}=\sum_{k=1}^{n_{1}}\left(A_{i k} C_{k j}+B_{i k} C_{k j}\right) \\ &=\sum_{k=1}^{n_{1}} A_{i k} C_{k j}+\sum_{k=1}^{n_{1}} B_{i k} C_{k j}=(A C)_{i j}+(B C)_{i j}, \quad \forall i, j . \end{aligned}\] Again, since the equality holds for all entries, we have \((A+B) C=A C+B C\).

Another useful rule concerning matrix-matrix product and transpose operation is \[(A B)^{\mathrm{T}}=B^{\mathrm{T}} A^{\mathrm{T}} .\] This rule is used very often.

Proof

The proof follows by checking the components of each side. The left-hand side yields \[\left((A B)^{\mathrm{T}}\right)_{i j}=(A B)_{j i}=\sum_{k=1}^{n_{1}} A_{j k} B_{k i} .\] The right-hand side yields \[\left(B^{\mathrm{T}} A^{\mathrm{T}}\right)_{i j}=\sum_{k=1}^{n_{1}}\left(B^{\mathrm{T}}\right)_{i k}\left(A^{\mathrm{T}}\right)_{k j}=\sum_{k=1}^{n_{1}} B_{k i} A_{j k}=\sum_{k=1}^{n_{1}} A_{j k} B_{k i}\] Thus, we have \[\left((A B)^{\mathrm{T}}\right)_{i j}=\left(B^{\mathrm{T}} A^{\mathrm{T}}\right)_{i j}, \quad i=1, \ldots, n_{2}, j=1, \ldots, m_{1} .\]

Interpretations of the Matrix-Vector Product

Let us consider a special case of the matrix-matrix product: the matrix-vector product. The special case arises when the second matrix has only one column. Then, with \(A \in \mathbb{R}^{m \times n}\) and \(w=B \in \mathbb{R}^{n \times 1}=\mathbb{R}^{n}\), we have \[C=A B\] where \[C_{i j}=\sum_{k=1}^{n} A_{i k} B_{k j}=\sum_{k=1}^{n} A_{i k} w_{k}, \quad i=1, \ldots, m_{1}, j=1 .\] Since \(C \in \mathbb{R}^{m \times 1}=\mathbb{R}^{m}\), we can introduce \(v \in \mathbb{R}^{m}\) and concisely write the matrix-vector product as \[v=A w,\] where \[v_{i}=\sum_{k=1}^{n} A_{i k} w_{k}, \quad i=1, \ldots, m .\] Expanding the summation, we can think of the matrix-vector product as \[\begin{aligned} v_{1} &=A_{11} w_{1}+A_{12} w_{2}+\cdots+A_{1 n} w_{n} \\ v_{2} &=A_{21} w_{1}+A_{22} w_{2}+\cdots+A_{2 n} w_{n} \\ & \vdots \\ v_{m} &=A_{m 1} w_{1}+A_{m 2} w_{2}+\cdots+A_{m n} w_{n} . \end{aligned}\] Now, we consider two different interpretations of the matrix-vector product.

Interpretation

The first interpretation is the "row" interpretation, where we consider the matrix-vector multiplication as a series of inner products. In particular, we consider \(v_{i}\) as the inner product of \(i\)-th row of \(A\) and \(w\). In other words, the vector \(v\) is computed entry by entry in the sense that \[v_{i}=\left(\begin{array}{cccc} A_{i 1} & A_{i 2} & \cdots & A_{i n} \end{array}\right)\left(\begin{array}{c} w_{1} \\ w_{2} \\ \vdots \\ w_{n} \end{array}\right), \quad i=1, \ldots, m .\]

Column Interpretation

The second interpretation is the "column" interpretation, where we consider the matrix-vector multiplication as a sum of \(n\) vectors corresponding to the \(n\) columns of the matrix, i.e. \[v=\left(\begin{array}{c} A_{11} \\ A_{21} \\ \vdots \\ A_{m 1} \end{array}\right) w_{1}+\left(\begin{array}{c} A_{12} \\ A_{22} \\ \vdots \\ A_{m 2} \end{array}\right) w_{2}+\cdots+\left(\begin{array}{c} A_{1 n} \\ A_{2 n} \\ \vdots \\ A_{m n} \end{array}\right) w_{n} .\] In this case, we consider \(v\) as a linear combination of columns of \(A\) with coefficients \(w\). Hence \(v=A w\) is simply another way to write a linear combination of vectors: the columns of \(A\) are the vectors, and \(w\) contains the coefficients.

Example 16.2.6 column interpretation of matrix-vector product

An example of the column interpretation of matrix-vector product is \[v=\left(\begin{array}{lll} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 1 \end{array}\right)\left(\begin{array}{l} 3 \\ 2 \\ 1 \end{array}\right)=3 \cdot\left(\begin{array}{l} 0 \\ 1 \\ 0 \\ 0 \end{array}\right)+2 \cdot\left(\begin{array}{l} 1 \\ 0 \\ 0 \\ 0 \end{array}\right)+1 \cdot\left(\begin{array}{l} 0 \\ 0 \\ 0 \\ 1 \end{array}\right)=\left(\begin{array}{l} 2 \\ 3 \\ 0 \\ 1 \end{array}\right) .\] Clearly, the outcome of the matrix-vector product is identical to that computed using the row interpretation.

Vector-Matrix Product

We now consider another special case of the matrix-matrix product: the left vector-matrix product. This special case arises when the first matrix only has one row. Then, we have \(A \in \mathbb{R}^{1 \times m}\) and \(B \in \mathbb{R}^{m \times n}\). Let us denote the matrix \(A\), which is a row vector, by \(w^{T}\). Clearly, \(w \in \mathbb{R}^{m}\), because \(w^{\mathrm{T}} \in \mathbb{R}^{1 \times m}\). The left vector-matrix product yields \[v=w^{\mathrm{T}} B,\] where \[v_{j}=\sum_{k=1}^{m} w_{k} B_{k j}, \quad j=1, \ldots, n .\] The resultant vector \(v\) is a row vector in \(\mathbb{R}^{1 \times n}\). The left vector-matrix product can also be interpreted in two different manners. The first interpretation considers the product as a series of dot products, where each entry \(v_{j}\) is computed as a dot product of \(w\) with the \(j\)-th column of \(B\), i.e. \[v_{j}=\left(\begin{array}{llll} w_{1} & w_{2} & \cdots & w_{m} \end{array}\right)\left(\begin{array}{c} B_{1 j} \\ B_{2 j} \\ \vdots \\ B_{m j} \end{array}\right), \quad j=1, \ldots, n .\] The second interpretation considers the left vector-matrix product as a linear combination of rows of \(B\), i.e. \[\begin{aligned} v=& w_{1}\left(\begin{array}{llll} B_{11} & B_{12} & \cdots & B_{1 n} \end{array}\right)+w_{2}\left(\begin{array}{llll} B_{21} & B_{22} & \cdots & B_{2 n} \end{array}\right) \\ &+\cdots+w_{m}\left(\begin{array}{lllll} B_{m 1} & B_{m 2} & \cdots & B_{m n} \end{array}\right) . \end{aligned}\]

Interpretations of the Matrix-Matrix Product

Similar to the matrix-vector product, the matrix-matrix product can be interpreted in a few different ways. Throughout the discussion, we assume \(A \in \mathbb{R}^{m_{1} \times n_{1}}\) and \(B \in \mathbb{R}^{n_{1} \times n_{2}}\) and hence \(C=A B \in\) \(\mathbb{R}^{m_{1} \times n_{2}}\).

Matrix-Matrix Product as a Series of Matrix-Vector Products

One interpretation of the matrix-matrix product is to consider it as computing \(C\) one column at a time, where the \(j\)-th column of \(C\) results from the matrix-vector product of the matrix \(A\) with the \(j\)-th column of \(B\), i.e. \[C_{\cdot j}=A B_{\cdot j}, \quad j=1, \ldots, n_{2},\] where \(C \cdot j\) refers to the \(j\)-th column of \(C\). In other words, \[\left(\begin{array}{c} C_{1 j} \\ C_{2 j} \\ \vdots \\ C_{m_{1} j} \end{array}\right)=\left(\begin{array}{cccc} A_{11} & A_{12} & \cdots & A_{1 n_{1}} \\ A_{21} & A_{22} & \cdots & A_{2 n_{1}} \\ \vdots & \vdots & \ddots & \vdots \\ A_{m_{1} 1} & A_{m_{1} 2} & \cdots & A_{m_{1} n_{1}} \end{array}\right)\left(\begin{array}{c} B_{1 j} \\ B_{2 j} \\ \vdots \\ B_{n_{1} j} \end{array}\right), \quad j=1, \ldots, n_{2} .\]

Example 16.2.7 matrix-matrix product as a series of matrix-vector products

\(A \in \mathbb{R}^{3 \times 2}\) and \(B \in \mathbb{R}^{2 \times 3}\) with \[A=\left(\begin{array}{cc} 1 & 3 \\ -4 & 9 \\ 0 & -3 \end{array}\right) \quad \text { and } B=\left(\begin{array}{ccc} 2 & 3 & -5 \\ 1 & 0 & -1 \end{array}\right)\] The first column of \(C=A B \in \mathbb{R}^{3 \times 3}\) is given by \[C \cdot 1=A B \cdot 1=\left(\begin{array}{cc} 1 & 3 \\ -4 & 9 \\ 0 & -3 \end{array}\right)\left(\begin{array}{l} 2 \\ 1 \end{array}\right)=\left(\begin{array}{c} 5 \\ 1 \\ -3 \end{array}\right) .\] Similarly, the second and third columns are given by \[C_{\cdot 2}=A B_{\cdot 2}=\left(\begin{array}{cc} 1 & 3 \\ -4 & 9 \\ 0 & -3 \end{array}\right)\left(\begin{array}{l} 3 \\ 0 \end{array}\right)=\left(\begin{array}{c} 3 \\ -12 \\ 0 \end{array}\right)\] and \[C_{3}=A B_{\cdot 3}=\left(\begin{array}{cc} 1 & 3 \\ -4 & 9 \\ 0 & -3 \end{array}\right)\left(\begin{array}{c} -5 \\ -1 \end{array}\right)=\left(\begin{array}{c} -8 \\ 11 \\ 3 \end{array}\right) .\] Putting the columns of \(C\) together \[C=\left(\begin{array}{ccc} C \cdot 1 & C \cdot 2 & C \cdot 3 \end{array}\right)=\left(\begin{array}{ccc} 5 & 3 & -8 \\ 1 & -12 & 11 \\ -3 & 0 & 3 \end{array}\right)\]

Matrix-Matrix Product as a Series of Left Vector-Matrix Products

In the previous interpretation, we performed the matrix-matrix product by constructing the resultant matrix one column at a time. We can also use a series of left vector-matrix products to construct the resultant matrix one row at a time. Namely, in \(C=A B\), the \(i\)-th row of \(C\) results from the left vector-matrix product of \(i\)-th row of \(A\) with the matrix \(B\), i.e. \[C_{i}=A_{i} . B, \quad i=1, \ldots, m_{1},\] where \(C_{i}\). refers to the \(i\)-th row of \(C\). In other words, \[\left(\begin{array}{ccc} C_{i 1} & \cdots & C_{i n_{1}} \end{array}\right)=\left(\begin{array}{ccc} A_{i 1} & \cdots & A_{i n_{1}} \end{array}\right)\left(\begin{array}{ccc} B_{11} & \cdots & B_{1 n_{2}} \\ \vdots & \ddots & \vdots \\ B_{m_{2} 1} & \cdots & B_{m_{2} n_{2}} \end{array}\right), \quad i=1, \ldots, m_{1} .\]

Operation Count of Matrix-Matrix Product

Matrix-matrix product is ubiquitous in scientific computing, and significant effort has been put into efficient performance of the operation on modern computers. Let us now count the number of additions and multiplications required to compute this product. Consider multiplication of \(A \in \mathbb{R}^{m_{1} \times n_{1}}\) and \(B \in \mathbb{R}^{n_{1} \times n_{2}}\). To compute \(C=A B\), we perform \[C_{i j}=\sum_{k=1}^{n_{1}} A_{i k} B_{k j}, \quad i=1, \ldots, m_{1}, j=1, \ldots, n_{2} .\] Computing each \(C_{i j}\) requires \(n_{1}\) multiplications and \(n_{1}\) additions, yielding the total of \(2 n_{1}\) operations. We must perform this for \(m_{1} n_{2}\) entries in \(C\). Thus, the total operation count for computing \(C\) is \(2 m_{1} n_{1} n_{2}\). Considering the matrix-vector product and the inner product as special cases of matrix-matrix product, we can summarize how the operation count scales.

Operation	Sizes	Operation count
Matrix-matrix	\(m_{1}=n_{1}=m_{2}=n_{2}=n\)	\(2 n^{3}\)
Matrix-vector	\(m_{1}=n_{1}=m_{2}=n, n_{2}=1\)	\(2 n^{2}\)
Inner product	\(n_{1}=m_{1}=n, m_{1}=n_{2}=1\)	\(2 n\)

The operation count is measured in FLoating Point Operations, or FLOPs. (Note FLOPS is different from FLOPs: FLOPS refers to FLoating Point Operations per Second, which is a "speed" associated with a particular computer/hardware and a particular implementation of an algorithm.)

Inverse of a Matrix (Briefly)

We have now studied the matrix vector product, in which, given a vector \(x \in \mathbb{R}^{n}\), we calculate a new vector \(b=A x\), where \(A \in \mathbb{R}^{n \times n}\) and hence \(b \in \mathbb{R}^{n}\). We may think of this as a "forward" problem, in which given \(x\) we calculate \(b=A x\). We can now also ask about the corresponding "inverse" problem: given \(b\), can we find \(x\) such that \(A x=b\) ? Note in this section, and for reasons which shall become clear shortly, we shall exclusively consider square matrices, and hence we set \(m=n\).

To begin, let us revert to the scalar case. If \(b\) is a scalar and \(a\) is a non-zero scalar, we know that the (very simple linear) equation \(a x=b\) has the solution \(x=b / a\). We may write this more suggestively as \(x=a^{-1} b\) since of course \(a^{-1}=1 / a\). It is important to note that the equation \(a x=b\) has a solution only if \(a\) is non-zero; if \(a\) is zero, then of course there is no \(x\) such that \(a x=b\). (This is not quite true: in fact, if \(b=0\) and \(a=0\) then \(a x=b\) has an infinity of solutions - any value of \(x\). We discuss this "singular but solvable" case in more detail in Unit V.)

We can now proceed to the matrix case "by analogy." The matrix equation \(A x=b\) can of course be viewed as a system of linear equations in \(n\) unknowns. The first equation states that the inner product of the first row of \(A\) with \(x\) must equal \(b_{1}\); in general, the \(i^{\text {th }}\) equation states that the inner product of the \(i^{\text {th }}\) row of \(A\) with \(x\) must equal \(b_{i}\). Then if \(A\) is non-zero we could plausibly expect that \(x=A^{-1} b\). This statement is clearly deficient in two related ways: what we do mean when we say a matrix is non-zero? and what do we in fact mean by \(A^{-1}\).

As regards the first question, \(A x=b\) will have a solution when \(A\) is non-singular: non-singular is the proper extension of the scalar concept of "non-zero" in this linear systems context. Conversely, if \(A\) is singular then (except for special \(b\) ) \(A x=b\) will have no solution: singular is the proper extension of the scalar concept of "zero" in this linear systems context. How can we determine if a matrix \(A\) is singular? Unfortunately, it is not nearly as simple as verifying, say, that the matrix consists of at least one non-zero entry, or contains all non-zero entries.

There are variety of ways to determine whether a matrix is non-singular, many of which may only make good sense in later chapters (in particular, in Unit V): a non-singular \(n \times n\) matrix \(A\) has \(n\) independent columns (or, equivalently, \(n\) independent rows); a non-singular \(n \times n\) matrix \(A\) has all non-zero eigenvalues; a non-singular matrix \(A\) has a non-zero determinant (perhaps this condition is closest to the scalar case, but it is also perhaps the least useful); a non-singular matrix \(A\) has all non-zero pivots in a (partially pivoted) "LU" decomposition process (described in Unit V). For now, we shall simply assume that \(A\) is non-singular. (We should also emphasize that in the numerical context we must be concerned not only with matrices which might be singular but also with matrices which are "almost" singular in some appropriate sense.) As regards the second question, we must first introduce the identity matrix, \(I\).

Let us now define an identity matrix. The identity matrix is a \(m \times m\) square matrix with ones on the diagonal and zeros elsewhere, i.e. \[I_{i j}=\left\{\begin{array}{ll} 1, & i=j \\ 0, & i \neq j \end{array} .\right.\] Identity matrices in \(\mathbb{R}^{1}, \mathbb{R}^{2}\), and \(\mathbb{R}^{3}\) are \[I=(1), \quad I=\left(\begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array}\right), \quad \text { and } \quad I=\left(\begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array}\right)\] The identity matrix is conventionally denoted by \(I\). If \(v \in \mathbb{R}^{m}\), the \(i\)-th entry of \(I v\) is \[\begin{aligned} (I v)_{i} &=\sum_{k=1}^{m} I_{i k} v_{k} \\ &=I_{i 1} v_{1}+\cdots+I_{i, i-1} v_{i-1}^{0}+I_{i i} v_{i}+I_{i, i+1} v_{i+1}^{0}+\cdots+I_{i m} v_{m}^{0} \\ &=v_{i}, \quad i=1, \ldots, m . \end{aligned}\] So, we have \(I v=v\). Following the same argument, we also have \(v^{\mathrm{T}} I=v^{\mathrm{T}}\). In essence, \(I\) is the \(m\)-dimensional version of "one."

We may then define \(A^{-1}\) as that (unique) matrix such that \(A^{-1} A=I\). (Of course in the scalar case, this defines \(a^{-1}\) as the unique scalar such that \(a^{-1} a=1\) and hence \(a^{-1}=1 / a\).) In fact, \(A^{-1} A=I\) and also \(A A^{-1}=I\) and thus this is a case in which matrix multiplication does indeed commute. We can now "derive" the result \(x=A^{-1} b\) : we begin with \(A x=b\) and multiply both sides by \(A^{-1}\) to obtain \(A^{-1} A x=A^{-1} b\) or, since the matrix product is associative, \(x=A^{-1} b\). Of course this definition of \(A^{-1}\) does not yet tell us how to find \(A^{-1}\) : we shall shortly consider this question from a pragmatic MATLAB perspective and then in Unit \(V\) from a more fundamental numerical linear algebra perspective. We should note here, however, that the matrix inverse is very rarely computed or used in practice, for reasons we will understand in Unit V. Nevertheless, the inverse can be quite useful for very small systems ( \(n\) small) and of course more generally as an central concept in the consideration of linear systems of equations.

Example 16.2.8 The inverse of a \times 2\) matrix

We consider here the case of a \(2 \times 2\) matrix \(A\) which we write as \[A=\left(\begin{array}{ll} a & b \\ c & d \end{array}\right) \text {. }\] If the columns are to be independent we must have \(a / b \neq c / d\) or \((a d) /(b c) \neq 1\) or \(a d-b c \neq 0\) which in fact is the condition that the determinant of \(A\) is nonzero. The inverse of \(A\) is then given by \[A^{-1}=\frac{1}{a d-b c}\left(\begin{array}{rr} d & -b \\ -c & a \end{array}\right) .\] Note that this inverse is only defined if \(a d-b c \neq 0\), and we thus see the necessity that \(A\) is nonsingular. It is a simple matter to show by explicit matrix multiplication that \(A^{-1} A=A A^{-1}=I\), as desired.