2.4: The Least Squares Estimation Problem

Last updated
Save as PDF

Page ID: 24237

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

The problem of interest is to find the least square error (LSE) estimate of the parameter vector x that arises in the linear model \(y \approx A x\), where A is an array of n vectors, \(A= [a_{1}, \dots, a_{n}]\). Defining the error e by

\[e = y - Ax\nonumber\]

what we want to determine is

\[\widehat{x}=\arg \min _{x}\|e\|=\arg \min _{x}\|y-A x\|, \quad y, A\nonumber\]

(where "\(\arg \min _{x}\)" should be read as "the value of the argument x that minimizes"). To state this yet another way, note that as \(x\) is varied, \(Ax\) ranges over the subspace \(\mathcal{R}(A)\), so we are looking for the point

\[\hat{y}=A\hat{x}\nonumber\]

that comes closest to \(y\), as measured by whatever norm we are using.

Rather than restricting the norm in the above expression to be the Euclidean 2-norm used in Lecture 1, we shall now actually permit it to be any norm induced by an inner product, so \(\|e\|=\sqrt{<e, e>}\). This will allow us to solve the so-called weighted least squares problem in a finite dimensional space with no additional work, because error criteria of the form \(e^{\prime} Se\) for positive definite Hermitian \(S\) are thereby included. Also, our problem formulation then applies to infinite dimensional spaces that have an inner product defined on them, with the restriction that our model \(Ax\) be confined to a finite dimensional subspace. This actually covers the cases of most interest to us; treatment of the more general case involves introducing further topological notions (closed subspaces, etc.), and we avoid doing this.

We shall also assume that the vectors \(a_{i}, i=1, \ldots, n\) in A are independent. This assumption is satisfied by any reasonably parametrized model, for otherwise there would be an infinite number of choices of \(x\) that attained any achievable value of the error \(y - Ax\). If the vectors in \(A\) are discovered to be dependent, then a re-parametrization of the model is needed to yield a well-parametrized model with independent vectors in the new \(A\). (A subtler problem - and one that we shall say something more about in the context of ill-conditioning and the singular value decomposition - is that the vectors in \(A\) can be nearly dependent, causing practical diffculties in numerical estimation of the parameters.)

Gram Matrix Lemma

An important route to verifying the independence of the vectors that make up the columns of \(A\) is a lemma that we shall refer to as the Gram Matrix Lemma. This states that the vectors in \(A\) are independent iff the associated Gram matrix (or Gramian) \(\prec A , A \succ= [<a_{i}, a_{j}>]\) is invertible; all norms are equivalent, as far as this result is concerned - one can pick any norm. As noted above, for the case of the usual Euclidean inner product, \(\prec A , A \succ = A^{\prime}A\). For an inner product of the form \(<a_{i}, a_{j}>=a_{i}^{\prime} S a_{j}\), where \(S\) is Hermitian and positive definite, we have \(\prec A , A \succ = A^{\prime}SA\). The lemma applies to the infinite dimensional setting as well (e.g. \(\mathcal{L}^{2}\)), provided we are only considering the independence of a finite subset of vectors.

Proof:

If the vectors in \(A\) are dependent, there is some nonzero vector \(\eta\) such that \(A\eta \sum_{j} a_{j} \eta_{j}=0\). But then \(\sum_{j}<a_{i}, a_{j}>\eta_{j}=0\), by the linearity of the inner product; in matrix form, we can write \(\prec A, A \succ \eta=0-\mathrm{so} \prec A, A \succ\) is not invertible.

Conversely, if \(\prec A, A \succ\) is not invertible, then \(\prec A, A \succ \eta=0\) for some nonzero \(\eta\). But then \({\eta}^ \prime \prec A, A \succ \eta=0\), so by the linearity of inner products \(<\sum \eta_{i} a_{i}, \sum a_{j} \eta_{j}>=0\) i.e. the norm of the vector \(\sum a_{j} \eta_{j}=A \eta\) is zero, so the vectors in \(A\) are dependent.