2.1: Introduction to Least Squares Estimation

Last updated
Save as PDF

Page ID: 24234

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

If the criterion used to measure the error \(e = y - Ax\) in the case of inconsistent system of equations is the sum of squared magnitudes of the error components, i.e. \(e^{\prime}e\), or equivalently the square root of this, which is the usual Euclidean norm or 2-norm \({\|e\|}_2\), then the problem is called a least squares problem. Formally it can be written as

\[\min _{x}\|y-A x\|_{2}\]

The x that minimizes this criterion is called the least square error estimate, or more simply, the least squares estimate. The choice of this criterion and the solution of the problem go back to Legendre (1805) and Gauss (around the same time).

Example 2.1

Suppose we make some measurements \(y_{i}\) of an unknown function f (t) at discrete points \({t}_{i},\ i= {1, \ldots N}\)

\[{y}_{i}=f({t}_{i}),\ i= {1, \ldots N}\]

We want to find the function g(t) in the space \(\mathcal{X}\) of polynomials of order \(m-1< N-1\) that best approximates f (t) at the measured points \({t}_{i}\), where

\[\chi=\left\{g(t)=\sum_{i=0}^{m-1} \alpha_{i} t^{i}, \alpha_{i} \text { real }\right\}\]

For any \(g(t) \in \chi\), we will have \({y}_{i}=g({t}_{i})+{e}_{i}\) for \(i= {1, \ldots N}\). Writing this in matrix form for the available data, we have

\[\underbrace{\left[\begin{array}{l}
y_{1} \\
\vdots \\
y_{N}
\end{array}\right]}_{y}=\underbrace{\left[\begin{array}{ccccc}
1 & t_{1} & t_{1}^{2} & \cdots & t_{1}^{m-1} \\
\vdots & & & \vdots \\
1 & t_{N} & t_{N}^{2} & \cdots & t_{N}^{m-1}
\end{array}\right]}_{A} \underbrace{\left[\begin{array}{c}
\alpha_{0} \\
\vdots \\
\alpha_{m-1}
\end{array}\right]}_{x}+\underbrace{\left[\begin{array}{c}
e_{1} \\
\vdots \\
e_{N}
\end{array}\right]}_{e}\]

The problem is to find \(\alpha_{0}, \dots, \alpha_{m-1}\) such that \(e^{\prime} e=\sum_{i=1}^{N} e_{i}^{2}\) is minimized.