5.1: Fitting with Weighted Least Squares
- Page ID
- 122616
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)- Least squares regression
- Using discrete data, can obtain a best estimate of functional relationship of data
- Assume \(\hat{y} = f(x)\) with polynomial form:\[\hat{y}(x) = a_{0} + a_{1} x + a_{2} x^{2} + \ldots a_{m}x^{m} = \sum_{i=0}^{m} a_{i} x^{i} \] where subscript identifies explicit coefficient multiplying continuous value of \(x\) raised to similar power
- \(n\) different values of \(x\) and \(y\) will solve \(m\) coefficients if \(m\leq n-1\)
- Deviation of the data to fit function over all data points \[D = \sum_{i=1}^{N} \left[y_{i} - \hat{y}\left(x_{i}\right)\right]^{2} \] \[D = \sum_{i=1}^{N} \left[y_{i} - \left(a_{0} + a_{1} x + a_{2} x^{2} + \ldots a_{m}x^{m}\right)\right]^{2} \] How to minimize the value \(D\)? Thru derivative of course \[dD = \frac{\partial D}{\partial a_{0}}da_{0} + \ldots + \frac{\partial D}{\partial a_{m}}da_{m} \]where full differential is applied.
- In order for \(dD \rightarrow 0\), each partial derivative should approach zero \[ \frac{\partial D}{\partial a_{0}} = 0 = \frac{\partial}{\partial a_{0}} \left\{\sum_{i=1}^{N} \left[y_{i} - \left(a_{0} + a_{1} x + a_{2} x^{2} + \ldots a_{m}x^{m}\right)\right]^{2} \right\} \] \[ \frac{\partial D}{\partial a_{m}} = 0 = \frac{\partial}{\partial a_{m}} \left\{\sum_{i=1}^{N} \left[y_{i} - \left(a_{0} + a_{1} x + a_{2} x^{2} + \ldots a_{m}x^{m}\right)\right]^{2} \right\} \] which yields \(m+1\) simultaneous equations to solve \(a_{0} \Rightarrow a_{m}\)
- How to set it up?
- Recall the polynomial form, but in terms of each \(i^{\mathrm{th}}\) data point:\[y_{i} = a_{0}x_{i}^{0} + a_{1}x_{i}^{1} + a_{2}x_{i}^{2} + \ldots a_{m}x_{i}^{m} \] where \(x_{i}\) is independent variable of trial and \(y_{i}\) is dependent measurand
- Need to determine coefficients, so set-up linear algebra equation:\[\left[ \begin{array}{ccccc} 1 & x_{1} & x_{1}^{2} & \ldots& x_{1}^{m} \\ 1 & x_{2} & x_{2}^{2} & \ldots& x_{2}^{m} \\ 1 & x_{3} & x_{3}^{2} & \ldots& x_{3}^{m} \\ & & \vdots & & \\ 1 & x_{n} & x_{n}^{2} & \ldots& x_{n}^{m} \end{array}\right] \left[\begin{array}{c} a_{0} \\ a_{1} \\ a_{2} \\ \vdots \\ a_{m} \end{array} \right ] = \left[\begin{array}{c} y_{1} \\ y_{2} \\ y_{3} \\ \vdots \\ y_{n} \end{array} \right ] \] which can be written more succinctly as: \[ M\vec{a} = \vec{y} \] where:
- \(M\) is matrix of \(n\) rows (total trials) and \(m+1\) columns (\(m^{\mathrm{th}}\) order of the polynomial)
- \(\vec{a}\) is vector of \(m+1\) unknown coefficients
- \(\vec{y}\) is vector list of \(n\) dependent measurands
- Rewriting with dimensional sizes indicated: \[ M_{n\times m+1} \ \vec{a}_{m+1 \times 1} = \vec{y}_{n\times 1} \]
- How to isolate \(\vec{a}\)? Use the inverse of course ...
- But it is impossible to invert a non-square matrix
- Time to use pseudoinverse
- Use the transpose to force a square form on left hand side \[ M^{T}\ M\ \vec{a} = M^{T}\ \vec{y} \] where transpose is flip of rows and columns across diagonal:\[ N = \left[\begin{array}{cccc} 1 & 2 & 3 & 4 \\ 1 & 4 & 9 & 16 \\ 1 & -1 & -2 & 1 \end{array}\right]\] \[ N^{T} = \left[\begin{array}{ccc} 1 & 1 & 1 \\ 2 & 4 & -1 \\ 3 & 9 & -2 \\ 4 & 16 & -1 \end{array}\right] \]
- Resulting left side is now square but same action on right side: \[ (M^{T}M)_{m+1\times m+1} \ \vec{a}_{m+1\times 1} = M^{T}_{m+1 \times n}\ \vec{y}_{n\times 1} \]
- More efficient manner of composing numerous sums of input data
- Inverse is applied recalling matrix multiplication is NOT commutative: \[ \vec{a} = (M^{T}M)^{-1} M^{T} \ \vec{y}\]
-
Prove same results achieved when Add trendline or polyfit function used when compared to pseudoinverse
- independent data has the form \(x = \{0.4,1.1,1.9,3, 5,6\}\)
- dependent data results in \(y = \{2.7,3.6,4.4,5.2,9.2,12.1\}\)
- What are number of clicks or code updates to change order of polynomial?
Solution
Add example text here.
- Weighted least squares
- Knowledge of pseudoinverse allows emphasis of data
- Diagonal matrix of values can increase/decrease impact of data points \[ W^{1/2}M\ \vec{a} = W^{1/2} \ \vec{y} \] where \(W\) has something like the form: \[ W = \left[\begin{array}{cccc} 10 & 0 & 0 & 0 \\ 0 & 10^{3} & 0 & 0 \\ 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 1 \end{array}\right] \] where \(x_{3}\) is twice as important as \(x_{4}\), \(x_{1}\) is 5 times more important then \(x_{3}\), and \(x_{2}\) is 2 orders of magnitude more important than \(x_{1}\)
- How to choose values to populate \(W\)?
- always use positive values (hence the \(^{1/2}\) power)
- larger values for those with greater confidence/importance
- lower values to reduce influence of spurious points
- Reasons for choosing:
- Size of uncertainty relative to measurement
- Spread of standard deviation statistics
- Trustworthiness of collaborative teammate
- Solution follows similar pseudoinverse format \[ \vec{a} = (M^{T}W^{1/2} M)^{-1} M^{T}W^{1/2}\ \vec{y}\]
-
How does changing \(W\) affect the polynomials and proximity of fit of example above? What happens to curve when emphasizing one or more data points?
Solution
Add example text here.
- Standard error of the fit \(S_{yx}\)
- Quantify deviation between \(y_{i}\) and \(\hat{y}(x)\) \[ S_{yx} = \frac{\sqrt{\sum_{i=1}^{N}(y_{i}-\hat{y}(x_{i}))^{2}}}{\nu} \] where \(\nu = N-(m+1)\) (number of data points subtracted by order of the polynomial fit
- Gives ``goodness of fit" like the \(R\) value
- Higher \(m\) will often make \(S_{yx}\) decrease
- Underlying physical principle should define the \(m^{\mathrm{th}}\) order of fit; otherwise it adds unnecessary inflection points in \(\hat{y}(x)\)
- Quantify deviation between \(y_{i}\) and \(\hat{y}(x)\) \[ S_{yx} = \frac{\sqrt{\sum_{i=1}^{N}(y_{i}-\hat{y}(x_{i}))^{2}}}{\nu} \] where \(\nu = N-(m+1)\) (number of data points subtracted by order of the polynomial fit

