2.6: Recursive Least Squares (optional)

Last updated
Save as PDF

Page ID: 24239

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

What if the data is coming in sequentially? Do we have to recompute everything each time a new data point comes in, or can we write our new, updated estimate in terms of our old estimate?

Consider the model

\[y_{i}=A_{i} x+e_{i}, \quad i=0,1, \ldots\nonumber\]

where \(y_{i} \in \mathbf{C}^{m \times 1}, A_{i} \in \mathbf{C}^{m \times n}, x \in \mathbf{C}^{n \times 1}, \text { and } e_{i} \in \mathbf{C}^{m \times 1}\). The vector \(e_{k}\) represents the mismatch between the measurement \(y_{k}\) and the model for it, \(A_{k}x\), where \(A_{k}\) is known and \(x\) is the vector of parameters to be estimated. At each time \(k\), we wish to find

\[\widehat{x}_{k}=\arg \min _{x}\left(\sum_{i=1}^{k}\left(y_{i}-A_{i} x\right)_{i}^{\prime} S_{i}\left(y_{i}-A_{i} x\right)\right)=\arg \min _{x}\left(\sum_{i=1}^{k} e_{i}^{\prime} S_{i} e_{i}\right)\nonumber\]

where \(S_{i} \in \mathbf{C}^{m \times 1}\) is a positive definite Hermitian matrix of weights, so that we can vary the importance of the \(e_{i}\)'s and components of the \(e_{i}\)'s in determining \(\widehat{x}_{k}\).

To compute \(\widehat{x}_{k+1}\), let:

\[\bar{y}_{k+1}=\left[\begin{array}{c}
y_{0} \\
y_{1} \\
\cdot \\
\cdot \\
y_{k+1}
\end{array}\right] ; \quad \bar{A}_{k+1}=\left[\begin{array}{c}
A_{0} \\
A_{1} \\
\cdot \\
\cdot \\
A_{k+1}
\end{array}\right] ; \quad \bar{e}_{k+1}=\left[\begin{array}{c}
e_{0} \\
e_{1} \\
\cdot \\
\cdot \\
e_{k+1}
\end{array}\right]\nonumber\]

and

\[\bar{S}_{k+1}=\operatorname{diag}\left(S_{0}, S_{1}, \ldots, S_{k+1}\right)\nonumber\]

where \(S_{i}\) is the weighting matrix for \(e_{i}\).

Our problem is then equivalent to

\[\min \left(\bar{e}_{k+1}^{\prime} \bar{S}_{k+1} \bar{e}_{k+1}\right)\nonumber\]

subject to: \(\bar{y}_{k+1}=\bar{A}_{k+1} x_{k+1}+\bar{e}_{k+1}\)

The solution can thus be written as

\[\left(\bar{A}_{k+1}^{\prime} \bar{S}_{k+1} \bar{A}_{k+1}\right) \widehat{x}_{k+1}=\bar{A}_{k+1}^{\prime} \bar{S}_{k+1} \bar{y}_{k+1}\nonumber\]

or in summation form as

\[\left(\sum_{i=0}^{k+1} A_{i}^{\prime} S_{i} A_{i}\right) \widehat{x}_{k+1}=\sum_{i=0}^{k+1} A_{i}^{\prime} S_{i} y_{i}\nonumber\]

Defining

\[Q_{k+1}=\sum_{i=0}^{k+1} A_{i}^{\prime} S_{i} A_{i}\nonumber\]

we can write a recursion for \(Q_{k+1}\) as follows:

\[Q_{k+1}=Q_{k}+A_{k+1}^{\prime} S_{k+1} A_{k+1}\nonumber\]

Rearranging the summation form equation for \(\widehat{x}_{k}+1\), we get

\[\begin{aligned}
\widehat{x}_{k+1} &=Q_{k+1}^{-1}\left[\left(\sum_{i=0}^{k} A_{i}^{\prime} S_{i} A_{i}\right) \widehat{x}_{k}+A_{k+1}^{\prime} S_{k+1} y_{k+1}\right] \\
&=Q_{k+1}^{-1}\left[Q_{k} \widehat{x}_{k}+A_{k+1}^{\prime} S_{k+1} y_{k+1}\right]
\end{aligned}\nonumber\]

This clearly displays the new estimate as a weighted combination of the old estimate and the new data, so we have the desired recursion. Another useful form of this result is obtained by substituting from the recursion for \(Q_{k+1}\) above to get

\[\widehat{x}_{k+1}=\widehat{x}_{k}-Q_{k+1}^{-1}\left(A_{k+1}^{\prime} S_{k+1} A_{k+1} \widehat{x}_{k}-A_{k+1}^{\prime} S_{k+1} y_{k+1}\right)\nonumber\]

which finally reduces to

\[\widehat{x}_{k+1}=\widehat{x}_{k}+\underbrace{Q_{k+1}^{-1} A_{k+1}^{\prime} S_{k+1}}_{\text {Kalman Filter Gain }} \underbrace{\left(y_{k+1}-A_{k+1} \widehat{x}_{k}\right)}_{\text {innovations }}\nonumber\]

The quantity \(Q_{k+1}^{-1} A_{k+1}^{\prime} S_{k+1}\) is called the Kalman gain, and \(y_{k+1}-A_{k+1} \widehat{x}_{k}\) is called the innovations, since it compares the difference between a data update and the prediction given the last estimate.

Unfortunately, as one acquires more and more data, i.e. as \(k\) grows large, the Kalman gain goes to zero. One data point cannot make much headway against the mass of previous data which has `hardened' the estimate. If we leave this estimator as is - without modification - the estimator `goes to sleep' after a while, and thus doesn't adapt well to parameter changes. The homework investigates the concept of a `fading memory' so that the estimator doesn't go to sleep.

An Implementation Issue

Another concept which is important in the implementation of the RLS algorithm is the computation of \(Q_{k+1}^{-1}\). If the dimension of \(Q_{k}\) is very large, computation of its inverse can be computationally expensive, so one would like to have a recursion for \(Q_{k+1}^{-1}\).

This recursion is easy to obtain. Applying the handy matrix identity

\[(A+B C D)^{-1}=A^{-1}-A^{-1} B\left(D A^{-1} B+C^{-1}\right)^{-1} D A^{-1}\nonumber\]

to the recursion for \(Q_{k+1}\) yields

\[Q_{k+1}^{-1}=Q_{k}^{-1}-Q_{k}^{-1} A_{k+1}^{\prime}\left(A_{k+1} Q_{k}^{-1} A_{k+1}^{\prime}+S_{k+1}^{-1}\right)^{-1} A_{k+1} Q_{k}^{-1}\nonumber\]

Upon defining

\[P_{k+1}=Q_{k+1}^{-1}\nonumber\]

this becomes

\[P_{k+1}=P_{k}-P_{k} A_{k+1}^{\prime}\left(S_{k+1}^{-1}+A_{k+1} P_{k} A_{k+1}^{\prime}\right)^{-1} A_{k+1} P_{k}\nonumber\]

which is called the (discrete-time) Riccati equation.

Interpretation

We have \(\widehat{x}_{k}\) and \({y}_{k+1}\) available for computing our updated estimate. Interpreting \(\widehat{x}_{k}\) as a measurement, we see our model becomes

\[\left[\begin{array}{c}
\widehat{x}_{k} \\
y_{k+1}
\end{array}\right]=\left[\begin{array}{c}
I \\
A_{k+1}
\end{array}\right] x+\left[\begin{array}{c}
e_{k} \\
e_{k+1}
\end{array}\right]\nonumber\]

The criterion, then, by which we choose \(\widehat{x}_{k+1}\) is thus

\[\widehat{x}_{k+1}=\operatorname{argmin}\left(e_{k}^{\prime} Q_{k} e_{k}+e_{k+1}^{\prime} S_{k+1} e_{k+1}\right)\nonumber\]

In this context, one interprets \({Q}_{k}\) as the weighting factor for the previous estimate.