3.2: Constructing all Solutions

Last updated
Save as PDF

Page ID: 24242

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

When the \(a_{i}'s\) are drawn from ordinary (real or complex) Euclidean n-space, with the usual (unweighted) inner product, A is an \(n \times m\) matrix of full column rank, and the equation (1a) is simply

\[y=A^{\prime} x \ \tag{1b}\]

where \(A^{\prime}\) has full row rank. Since the \(m\) rows of \(A^{\prime}\) in (1b) are independent, this matrix has \(m\) independent columns as well. It follows that the system (1b), which can be read as expressing \(y\) in terms of a linear combination of the columns of \(A^{\prime}\) (with weights given by the components of \(x\)) has solutions \(x\) for any \(y\).

If \(A^{\prime}\) were square and therefore (under our rank assumption) invertible, (1b) would have a unique solution, obtained simply by premultiplying (1b) by the inverse of \(A^{\prime}\). The closest we come to having an invertible matrix in the non-square case is by invoking the Gram matrix lemma, which tells us that \(A^{\prime}A\) is invertible under our rank assumption. This fact, and inspection of (1b), allow us to explicitly write down one particular solution of (1b), which we denote by \(\check{x}\):

\[\check{x}=A\left(A^{\prime} A\right)^{-1} y \ \tag{2a}\]

Simple substitution of this expression in (1b) verifies that it is indeed a solution. We shall shortly see that this solution actually has minimum length (norm) among all solutions of (1b).

For the more general equation in (1a), we can establish the existence of a solution by demonstrating that the appropriate generalization of the expression in (2a) does indeed satisfy (1a). For this, pick

\[\check{x}=A \prec A, A \succ^{-1} y\ \tag{2b}\]

It is easy to see that this satisfies (1a), if we use the fact that \(\prec A, A \alpha \succ=\prec A, A \succ \alpha\) for any array \(\alpha\) of scalars; in our case \(\alpha\) is the \(m \times 1\) array \(\prec A, A \succ^{-1} y\).

Any other \(x\) is a solution of (1a) iff it differs from the particular solution above (or any other particular solution) by a solution of the homogeneous equation \(\prec A, x \succ =0\) the same statement can be made for solutions of (1b). The proof is easy, and presented below for (1b), with \(x\) denoting any solution, \(x_{p}\) denoting a particular solution, and \(x_{h}\) denoting a solution of the homogeneous equation:

\[y=A^{\prime} x_{p}=A^{\prime} x \Rightarrow A^{\prime} \underbrace{\left(x-x_{p}\right)}_{x_{h}} \Rightarrow 0 \Rightarrow x_{p}+x_{h}\nonumber\]

Conversely,

\[y=A^{\prime} x_{p}, \quad A^{\prime} x_{h}=0 \Rightarrow y=A^{\prime} \underbrace{\left(x_{p}+x_{h}\right)}_{x} \Rightarrow x=x_{p}+x_{h}\nonumber\]

Equations of the form (1a), (1b) commonly arise in situations where \(x\) represents a vector of control inputs and \(y\) represents a vector of objectives or targets. The problem is then to use some appropriate criterion and/or constraints to narrow down the set of controls.

Example 3.2

Let \(m = 1\), so that \(A^{\prime}\) is a single nonzero row, which we shall denote by \(a^{\prime}\). If \(y = 0\), the set of solutions corresponds to vectors \(x\) that are orthogonal to the vector \(a\), i.e. to vectors in the orthogonal complement of \(a\), namely in the subspace \(\mathcal{R} a^{\perp}(a)\). Use this to costruct all solutions to Example 3.1.

There are several different criteria and constraints that may reasonably be used to select among the different possible solutions. For example, in some problems it may be natural to restrict the components \(x_{i}\) of \(x\) to be nonnegative, and to ask for the control that minimizes \(\sum s_{i} x_{i}\), where \(s_{i}\) represents the cost of control component \(x_{i}\). This is the prototypical form of what is termed the linear programming problem. (You should geometrically characterize the solution to this problem for the case given in the above example.) The general linear programming problem arises in a host of applications.

We shall focus on the problem of determining the solution \(x\) of (1a) for which \({\|x\|}^2=<x, x>\) is minimized; in the case of (1b), we are looking to minimize \(x^{\prime}x\). For the situation depicted in the above example, the optimum \(x\) is immediately seen to be the solution vector that is aligned with \(a\). It can be found by projecting any particular solution of (1b) onto the space spanned by the vector \(a\). (This fact is related to the Cauchy-Schwartz inequality: For \(x\) of a specified length, the inner product \(<a,x>\) is maximized by aligning \(x\) with \(a\), and for specified \(<a,x>\) the length of \(x\) is minimized by again aligning \(x\) with \(a\).) The generalization to \(m > 1\) and to the broader setting of (1a) is direct, and is presented next. You should note the similarity to the proof of the orthogonality principle.