# 2.7: Exercises

- Page ID
- 27998

Exercise 2.1 Least Squares Fit of an Ellipse

Suppose a particular object is modeled as moving in an elliptical orbit centered at the origin. Its nominal trajectory is described in rectangular coordinates \((r, s)\) by the constraint equation \(x_{1} r^{2}+ x_{2} s^{2}+ x_{3} rs=1\), where \(x_{1}\), \(x_{2}\), and \(x_{3}\) are unknown parameters that specify the orbit. We have available the following noisy measurements of the object's coordinates \((r, s)\) at ten different points on its orbit:

\[\begin{array}{l}

(0.6728,0.0589)(0.3380,0.4093)(0.2510,0.3559)(-0.0684,0.5449) \\

(-0.4329,0.3657)(-0.6921,0.0252)(-0.3681,-0.2020)(0.0019,-0.3769) \\

(0.0825,-0.3508)(0.5294,-0.2918)

\end{array} \nonumber\]

**Answer**-
The ten measurements are believed to be equally reliable. For your convenience, these ten pairs of measured \((r, s)\) values have been stored in column vectors named \(r\) and \(s\) that you can access through the 6.241 locker on Athena.* After

*add 6.241*, and once in the directory in which you are running Matlab, you can copy the data using*cp /mit/6.241/Public/fall95/hw1rs.mat hw1rs.mat*. Then, in Matlab, type*load hw1rs*to load the desired data; type*who*to confirm that the vectors \(r\) and \(s\) are indeed available.Using the assumed constraint equation, we can arrange the given information in the form of the linear system of (approximate) equations \(A x \approx b\), where \(A\) is a known \(10 \times 3\) matrix, \(b\) is a known \(10 \times 1\) vector, and \(x=\left(x_{1}, x_{2}, x_{3}\right)^{T}\). This system of 10 equations in 3 unknowns is inconsistent. We wish to find the solution \(x\) that minimizes the Euclidean norm (or length) of the error \(Ax - b\). Compare the solutions obtained by using the following four Matlab invocations, each of which in principle gives the desired least-square-error solution:

(a) \(x=A\backslash b \)

(b) \(x=\operatorname{pinv}(A) * b \)

(c) \(x=\operatorname{inv}\left(A^{\prime} * A\right) * A^{\prime} * b\)

(d) \([q, r]=q r(A)\), followed by implementation of the approach described in Exercise 3.1For more information on these commands, try

*help slash, help qr, help pinv, help inv*, etc. [Incidentally, the prime, \(^{\prime}\), in Matlab takes the transpose of the*complex conjugate*of a matrix; if you want the ordinary transpose of a complex matrix \(C\), you have to write \(C^{\prime}\) or \(transp(C)\).]You should include in your solutions a plot the ellipse that corresponds to your estimate of \(x\). If you create the following function file in your Matlab directory, with the name

*ellipse.m*, you can obtain the polar coordinates*theta, rho*of \(n\) points on the ellipse specified by the parameter vector \(x\). To do this, enter*[theta,rho]=ellipse(x,n);*at the Matlab prompt. You can then plot the ellipse by using the*polar(theta,rho)*command.\[\begin{array}{l}

\text {function [theta, rho]=ellipse(x,n)} \\

\%\ \text {[theta, rho]= ellipse(x,n)} \\

\% \\

\% \text { The vector} \ x= [x(1), x(2), x(3)] ^ {\prime}\, \text {,defines an ellipse centered at the origin} \\

\% \text{ via the equation x(1)*} \mathrm{r}^{\wedge}2 + x(2)*\mathrm{s}^{\wedge}2+ x(3)*r*s=1 \text{.} \\

\%\ \text{This routine generates the polar coordinates of points on the eclipse,} \\

\% \text{ to send to a plot command. It does this by solving for the radial} \\

\% \text{ distance in n equally spaced angular directions.} \\

\% \text{ Use polar(theta, rho) to actually plot the ellipse.} \\

\end{array}\nonumber\]

\[\begin{array}{l}

\text {theta}=0:\left(2^{*} \mathrm{pi} / \mathrm{n}\right):\left(2^{*} \mathrm{pi}\right); \\

\mathrm{a}=\mathrm{x}(1)^{*} \cos (\text {theta}) \cdot^{\wedge} 2+\mathrm{x}(2)^{*} \sin (\text {theta}) \cdot^{\wedge} 2+\mathrm{x}(3)^{*}\left(\cos (\text {theta}) \cdot^{*} \sin (\text {theta} )\right); \\

\text {rho}=\operatorname{ones}(\operatorname{size}(\mathrm{a})) \cdot / \mathrm{sqrt}(\mathrm{a});

\end{array}\nonumber\]

Exercise 2.2 Approximation by a Polynomial

Let \(f(t)=0.5 e^{0.8 t}, t \in[0,2]\).

(a) Suppose 16 exact measurements of \(f(t)\) are available to you, taken at the times \(t_{i}\) listed in the array *T* below:

\[\left.\begin{array}{llllllll}

T= & {\left[2 \cdot 10^{-3},\right.} & 0.136, & 0.268, & 0.402, & 0.536, & 0.668, & 0.802, & 0.936 \\

& 1.068, & 1.202, & 1.336, & 1.468, & 1.602, & 1.736, & 1.868, & 2.000

\end{array}\right]\nonumber\]

Use Matlab to generate these measurements:

\[y_{i}=f\left(t_{i}\right) \quad i=1, \ldots, 16 \quad t_{i} \in T\nonumber\]

Now determine the coefficients of the least square error polynomial approximation of the measurements, for

- a polynomial of degree 15, \(p_{15}(t)\);
- a polynomial of degree 2, \(p_{2}(t)\).

Compare the quality of the two approximations by plotting \(y(t_{i})\), \(p_{15}(t_{i})\) and \(p_{2}(t_{i})\) for all \(t_{i}\) in *T* . To see how well we are approximating the function on the whole interval, also plot \(f(t)\), \(p_{15}(t)\) and \(p_{2}(t)\) on the interval [0, 2]. (Pick a very fine grid for the interval, e.g. t=[0:1000]'/500.) Report your observations and comments.

(b) Now suppose that your measurements are affected by some noise. Generate the measurements using

\[y_{i}=f\left(t_{i}\right) + e(t_{i})\quad i=1, \ldots, 16 \quad t_{i} \in T\nonumber\]

where the vector of noise values can be generated in the following way:

\[\begin{array}{l}

\text {randn}\left(^{\prime} \text {seed}^{\prime}, 0\right); \\

e=\operatorname{randn}(\operatorname{siz} e(T));

\end{array}\nonumber\]

Again determine the coefficients of the least square error polynomial approximation of the measurements for

- a polynomial of degree 15, \(p_{15}(t)\);
- a polynomial of degree 2, \(p_{2}(t)\).

Compare the two approximations as in part (a). Report your observations and comments. Explain any surprising results.

(c) So far we have obtained polynomial approximations of \(f(t), t \in [0, 2]\), by approximating the measurements at \(t_{i} \in {T}\). We are now interested in minimizing the square error of the polynomial approximation over the whole interval [0, 2]:

\[\min \left\|f(t)-p_{n}(t)\right\|_{2}^{2}=\min \int_{0}^{2}\left|f(t)-p_{n}(t)\right|^{2} d t\nonumber\]

where \({p}_{n}(t)\) is some polynomial of degree \(n\). Find the polynomial \({p}_{2}(t)\) of degree 2 that solves the above problem. Are the optimal \({p}_{2}(t)\) in this case and the optimal \({p}_{2}(t)\) of parts (a) and (b) very different from each other? Elaborate.

Exercise 2.3 Combining EStimates

Suppose \(y_{1}=C_{1} x+e_{1}\) and \(y_{1}=C_{1} x+e_{1}\), where x is an *n-vector*, and \(C_{1}\), \(C_{2}\) have full column rank. Let \(\widehat{x}_{1}\) denote the value of \(x\) that minimizes \(e_{1}^{T} S_{1} e_{1}\), and \(\widehat{x}_{2}\) denote the value that minimizes \(e_{2}^{T} S_{2} e_{2}\), where \(S_{1}\) and \(S_{2}\) are positive definite matrices. Show that the value \(\widehat{x}\) of \(x\) that minimizes \(e_{1}^{T} S_{1} e_{1}+ e_{2}^{T} S_{2} e_{2}\) can be written entirely in terms of \(\widehat{x}_{1}\), \(\widehat{x}_{2}\), and the \(n \times n\) matrices \(Q_{1}=C_{1}^{T} S_{1} C_{1}\) and \(Q_{2}=C_{2}^{T} S_{2} C_{2}\). What is the significance of this result?

Exercise 2.4 Exponentially Windowed Estimates

Suppose we observe the *scalar *measurements

\[y_{i}=c_{i} x+e_{i}, \quad i=1,2, \ldots\nonumber\]

where \(c_{i}\) and \(x\) are possibly vectors (row- and column-vectors respectively).

(a) Show (by reducing this to a problem that we already know how to solve - don't start from scratch!) that the value \(\widehat{x}_{k}\) of \(x\) that minimizes the criterion

\[\sum_{i=1}^{k} f^{k-i} e_{i}^{2}, \quad \text { some fixed } f, \quad 0<f \leq 1\nonumber\]

is given by

\[\hat{x}_{k}=\left(\sum_{i=1}^{k} f^{k-i} c_{i}^{T} c_{i}\right)^{-1}\left(\sum_{i=1}^{k} f^{k-i} c_{i}^{T} y_{i}\right)\nonumber\]

The so-called *fade* or *forgetting factor* *f* allows us to preferentially weight the more recent measurements by picking \(0 < f < 1\), so that old data is discounted at an exponential rate. We then say that the data has been subjected to exponential fading or forgetting or weighting or windowing or tapering or ... . This is usually desirable, in order to keep the filter adaptive to changes that may occur in \(x\). Otherwise the filter becomes progressively less attentive to new data and falls asleep, with its gain approaching 0.

(b) Now show that

\[\hat{x}_{k}=\hat{x}_{k-1}+Q_{k}^{-1} c_{k}^{T}\left(y_{k}-c_{k} \hat{x}_{k-1}\right)\nonumber\]

where

\[Q_{k}=f Q_{k-1}+c_{k}^{T} c_{k}, \quad Q_{0}=0\nonumber\]

The vector \(g_{k} = Q_{k}^{-1} c_{k}^{T}\) is termed the *gain* of the estimator.

(c) If \(x\) and \(c_{i}\) are scalars, and \(c_{i}\) is a constant \(c\), determine \(g_{k}\) as a function of \(k\). What is the *steady-state gain* \(g_\infty\)? Does \(g_\infty\) increase or decrease as \(f\) increases - and why do you expect this?

Exercise 2.5

Suppose our model for some waveform \(y(t)\) is \(y(t)=\alpha \sin (\omega t)\), where \(\alpha\) is a scalar, and suppose we have measurements \(y\left(t_{1}\right), \ldots, y\left(t_{p}\right)\). Because of modeling errors and the presence of measurement noise, we will generally not find any choice of model parameters that allows us to precisely account for all *p* measurements.

(a) If \(\omega\) is known, find the value of \(\alpha\) that minimizes

\[\sum_{i=1}^{p}\left[y\left(t_{i}\right)-\alpha \sin \left(\omega t_{i}\right)\right]^{2}\nonumber\]

(b) Determine this value of \(\alpha\) if \(\omega=2\) and if the measured values of \(y(t)\) are:

\[\begin{array}{llll}

y(1)=+2.31 & y(2)=-2.01 & y(3)=-1.33 & y(4)=+3.23 \\

y(5)=-1.28 & y(6)=-1.66 & y(7)=+3.28 & y(8)=-0.88

\end{array}\nonumber\]

(I generated this data using the equation \(y(t)=3 \sin (2 t)+ e(t)\) evaluated at the integer values \(t=1, \ldots, 8\), and with \(e(t)\) for each \(t\) being a random number uniformly distributed in the interval - 0.5 to +0.5.)

(c) Suppose that \(\alpha\) and \(\omega\) are unknown, and that we wish to determine the values of these two variables that minimize the above criterion. Assume you are given initial estimates \(\alpha_{0}\) and \(\omega_{0}\) for the minimizing values of these variables. Using the Gauss-Newton algorithm for this nonlinear least squares problem, i.e. applying LLSE to the problem obtained by linearizing about the initial estimates, determine explicitly the estimates \(\alpha_{1}\) and \(\omega_{1}\) obtained after one iteration of this algorithm. Use the following notation to help you write out the solution in a condensed form:

\[a=\sum \sin ^{2}\left(\omega_{0} t_{i}\right), \quad b=\sum t_{i}^{2} \cos ^{2}\left(\omega_{0} t_{i}\right), \quad c=\sum t_{i}\left[\sin \left(w_{0} t_{i}\right)\right]\left[\cos \left(w_{0} t_{i}\right)\right]\nonumber\]

(d) What values do you get for \(\alpha_{1}\) and \(\omega_{1}\) with the data given in (b) above if the initial guesses are \(\alpha_{0}=3.2\) and \(\omega_{0}=1.8\)? Continue the iterative estimation a few more steps. Repeat the procedure when the initial guesses are \(\alpha_{0}=3.5\) and \(\omega_{0}=2.5\), verifying that the algorithm does not converge.

(e) Since only \(\omega\) enters the model nonlinearly, we might think of a decomposed algorithm, in which \(\alpha\) is estimated using linear least squares and \(\omega\) is estimated via nonlinear least squares. Suppose, for example, that our initial estimate of \(\omega\) is \(\omega_{0}=1.8\). Now obtain an estimate \(\alpha_{1}\) of \(\alpha\) using the linear least squares method that you used in (b). Then obtain an (improved?) estimate \(\omega_{1}\) of \(\omega\), using one iteration of a Gauss-Newton algorithm (similar to what is needed in (c), except that now you are only trying to estimate \(\omega\)). Next obtain the estimate \(\alpha_{2}\) via linear least squares, and so on. Compare your results with what you obtain via this decomposed procedure when your initial estimate is \(\omega_{0}=2.5\) instead of 1.8.

Exercise 2.6 Comparing Different Estimators

This problem asks you to compare the behavior of different parameter estimation algorithms by fitting a model of the type \(y(t)=a \sin (2 \pi t)+b \cos (4 \pi t)\) to noisy data taken at values of \(t\) that are .02 apart in the interval (0,2].

**Answer**-
First synthesize the data on which you will test the algorithms. Even though your estimation algorithms will assume that \(a\) and \(b\) are constant, we are interested in seeing how they track parameter changes as well. Accordingly, let \(a = 2\), \(b = 2\) for the first 50 points, and \(a = 1\), \(b = 3\) for the next 50 points. To get (approximately) normally distributed random variables, we use the function

*randn*to produce variables with mean 0 and variance 1.An elegant way to generate the data in Matlab, exploiting Matlab's facility with vectors, is to define the vectors

*t*1 = 0:02 : 0:02 : 1.0 and*t*2 = 1:02 : 0:02 : 2.0, then set\[y 1=2 * \sin (2 * \mathrm{pi} * t 1)+2 * \cos (4 * \mathrm{pi} * t 1)+s * \operatorname{randn}(\operatorname{siz} e(t 1))\nonumber\]

and

\[y 2=\sin (2 * p i * t 2)+3 * \cos (4 * p i * t 2)+s * \operatorname{randn}(\operatorname{siz} e(t 2))\nonumber\]

where

*s*determines the standard deviation of the noise. Pick \(s = 1\) for this problem. Finally, set \(y = [y1, y2]\). No loops, no counters, no fuss!!Now estimate

*a*and*b*from*y*using the following algorithms. Assume prior estimates \(\widehat{a}_{0}= 3\) and \(\widehat{b}_{0}= 1\), weighted equally with the measurements (so all weights can be taken as 1 without loss of generality). Plot your results to aid comparison.(i) Recursive least squares

(ii) Recursive least squares with exponentially fading memory, as in Problem 3. Use \(f = .96\)

(iii) The algorithm in (ii), but with \(Q_{k}\) of Problem 3 replaced by \(q_{k} = (1/n) \times trace(Q_{k})\), where \(n\) is the number of parameters, so \(n = 2\) in this case. (Recall that the trace of a matrix is the sum of its diagonal elements. Note that \(q_{k}\) itself satisfies a recursion, which you should write down.)

(iv) An algorithm of the form

\[\hat{x}_{k}=\hat{x}_{k-1}+\frac{.04}{c_{k} c_{k}^{T}} c_{k}^{T}\left(y_{k}-c_{k} \hat{x}_{k-1}\right)\nonumber\]

where \(c_{k}=[\sin (2 \pi t), \cos (4 \pi t)]\) evaluated at the

*k*th sampling instant, so \(t = .02k\).

Exercise 2.7 Recursive Estimation of a State Vector

This course will soon begin to consider *state-space models* of the form

\[x_{l}=A x_{l-1}\ \tag{2.4}\]

where \(x_{l}\) is an *n*-vector denoting the state at time \(l\) of our model of some system, and *A* is a known \(n \times n\) matrix. For example, suppose the system of interest is a rotating machine, with angular position \(d_{l}\) and angular velocity \(\omega_{l}\) at time \(t = l T\), where \(T\) is some fixed sampling interval. If we believed the machine to be rotating at constant speed, we would be led to the model

\[\left(\begin{array}{l}

d_{l} \\

\omega_{l}

\end{array}\right)=\left(\begin{array}{ll}

1 & T \\

0 & 1

\end{array}\right)\left(\begin{array}{l}

d_{l-1} \\

\omega_{l-1}

\end{array}\right)\nonumber\]

Assume *A* to be nonsingular throughout this problem.

For the rotating machine example above, it is often of interest to obtain least-square-error estimates of the position and (constant) velocity, using noisy measurements of the angular position \(d_{j}\) at the sampling instants. More generally, it is of interest to obtain a least-square-error estimate of the state vector \(x_{i}\) in the model (2.4) from noisy *p-*component measurements \(y_{j}\) that are related to \(x_{j}\) by a linear equation of the form

\[y_{j}=C x_{j}+e_{j}, \quad j=1, \ldots, i\nonumber\]

where C is a \(p \times n\) matrix. We shall also assume that a prior estimate \(\widehat{x}_{0}\) of \(x_{0}\) is available:

\[\widehat{x}_{0}= x_{0}+ e_{0}\nonumber\]

Let \(\widehat{x}_{i|i}\) denote the value of \(x_{i}\) that minimizes

\[\sum_{j=0}^{i}\left\|e_{j}\right\|^{2}\nonumber\]

This is the estimate of \(x_{i}\) given the prior estimate and measurements up to time \(i\), or the "filtered estimate" of \(x_{i}\). Similarly, let \(\widehat{x}_{i|i-1}\) denote the value of \(x_{i}\) that minimizes

\[\sum_{j=0}^{i-1}\left\|e_{j}\right\|^{2}\nonumber\]

This is the least-square-error estimate of \(x_{i}\) given the prior estimate and measurements up to time \(i - 1\), and is termed the "one-step prediction" of \(x_{i}\).

a) Set up the linear system of equations whose least square error solution would be \(\widehat{x}_{i|i}\). Similarly, set up the linear system of equations whose least square error solution would be \(\widehat{x}_{i|i-1}\).

b) Show that \(\widehat{x}_{i|i-1}=A\widehat{x}_{i-1|i-1}\)

c) Determine a recursion that expresses \(\widehat{x}_{i|i}\) in terms of \(\widehat{x}_{i-1|i-1}\) and \(y_{i}\). This is the prototype of what is known as the *Kalman filter*. A more elaborate version of the *Kalman filter* would include additive noise driving the state-space model, and other embellishments, all in a stochastic context (rather than the deterministic one given here).

Exercise 2.8

Let \(\widehat{x}\) denote the value of \(x\) that minimizes \(\|y-A x\|^{2}\), where \(A\) has full column rank. Let \(\bar{x}\) denote the value of \(x\) that minimizes this same criterion, but now subject to the constraint that \(z = Dx\), where D has full row rank. Show that

\[\bar{x}=\hat{x}+\left(A^{T} A\right)^{-1} D^{T}\left(D\left(A^{T} A\right)^{-1} D^{T}\right)^{-1}(z-D \hat{x})\nonumber\]

(Hint: One approach to solving this is to use our recursive least squares formulation, but modified for the limiting case where one of the measurement sets - namely \(z = Dx\) in this case - is known to have no error. You may have to use some of the matrix identities from the previous chapter).