Skip to main content
Engineering LibreTexts

5.4: Total Least Squares

  • Page ID
    24259
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    We have previously examined solving least squares problems of the form \(y = Ax + e\). An interpretation of the problem we solved there is that we perturbed \(y\) as little as possible - in the least squares sense - to make the resulting equation \(y - e = Ax\) consistent. It is natural to ask what happens if we allow \(A\) to be perturbed as well, in addition to perturbing \(y\). This makes sense in situations where the uncertainty in our model and the noise in our measurements cannot or should not be attributed entirely to \(y\), but also to \(A\). The simplest least squares problem of this type is one that allows a perturbed model of the form

    \[y=(A+\Delta) x+e \ \tag{5.16}\]

    The so-called total least squares estimation problem can now be stated as

    \[\min _{\Delta, e}\left(\sum_{i, j}\left|\Delta_{i j}\right|^{2}+\sum_{i}\left|e_{i}\right|^{2}\right)^{\frac{1}{2}}=\min _{\Delta, e}\|\Delta \vdots e\|_{F} \ \tag{5.17}\]

    \[=\min _{\Delta, e}\|\hat{\Delta}\|_{F} \ \tag{5.18}\]

    where

    \[\hat{\Delta}=[\Delta \vdots e] \ \tag{5.19}\]

    Weighted versions of this problem can also be posed, but we omit these generalizations.

    Note that no constraints have been imposed on \(\Delta\) in the above problem statement, and this can often limit the direct usefulness of the total least squares formulation in practical problems. In practice, the expected or allowed perturbations of \(A\) are often quite structured; however, the solution of the total least squares problem under such structural constraints is much harder than that of the unconstrained problem that we present the solution of next. Nevertheless, the total least squares formulation can provide a useful benchmark. (The same sorts of comments can of course be made about the conventional least squares formulation: it is often not the criterion that we would want to use, but its tractability compared to other criteria makes it a useful point of departure.)

    If we make the definitions

    \[\hat{A}=\left[\begin{array}{lll}
    A & \vdots & -y
    \end{array}\right], \quad \hat{x}=\left[\begin{array}{l}
    x \\
    1
    \end{array}\right] \ \tag{5.20}\]

    then the perturbed model in Equation (5.16) can be rewritten as

    \[(\hat{A}+\hat{\Delta}) \hat{x}=0 \ \tag{5.21}\]

    This equation makes evident that what we seek is the \(\hat{\Delta}\) with minimal Frobenius norm that satisfies Equation (5.21) - the smallest \(\hat{\Delta}\) that makes \(\hat{A} + \hat{\Delta}\) singular.

    Let us suppose that \(A\) has full column rank (\(n\)), and that it has more rows than columns (which is normally the case, since in least squares estimation we typically have many more measurements than parameters to estimate). In addition, let us assume that \(\hat{A}\) has \(rank (n + 1)\), which is also generally true. From what we've learned about additive perturbations, we now see that a minimal (in a Frobenius sense) \(\hat{A}\) that satisfies Equation (5.21) is

    \[\hat{\Delta}=-\sigma_{n+1} u_{n+1} v_{n+1}^{\prime} \ \tag{5.22}\]

    where the \(\sigma_{n+1}\), \(u_{n+1}\) and \(v_{n+1}\) are derived from the SVD of \(\hat{A}\) (i.e. \(\sigma_{n+1}\) is the smallest singular value of \(\hat{A}\), etc.). Given that we now know \(\hat{A}\) and \(\hat{\Delta}\), choosing \(\hat{x}= v_{n+1}\), and rescaling \(\hat{x}\), we have

    \[(\hat{A}+\hat{\Delta})\left[\begin{array}{l}
    x \\
    1
    \end{array}\right]=0\nonumber\]

    which gives us \(x\), the total least squares solution. This solution is due to Golub and Van Loan (see their classic text on Matrix Computations, Second Edition, Johns Hopkins University Press, 1989).


    This page titled 5.4: Total Least Squares is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by Mohammed Dahleh, Munther A. Dahleh, and George Verghese (MIT OpenCourseWare) via source content that was edited to the style and standards of the LibreTexts platform; a detailed edit history is available upon request.