Skip to main content
Engineering LibreTexts

29.3: Multivariate Newton

  • Page ID
    55714
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Model Problem

    Now we will apply the Newton method to solve multivariate nonlinear systems of equations. For example, we can consider the simple bivariate system of nonlinear equations \[\begin{aligned} &f_{1}\left(z_{1}, z_{2}\right)=z_{1}^{2}+2 z_{2}^{2}-22=0, \\ &f_{2}\left(z_{1}, z_{2}\right)=2 z_{1}^{2}+z_{2}^{2}-17=0 . \end{aligned}\] Note that \(f_{1}\) and \(f_{2}\) are the two paraboloids each with principal axes aligned with the coordinate directions; we plot \(f_{1}\) and \(f_{2}\) in shown in Figure 29.5. We wish to find a zero \(\boldsymbol{Z}=\left(\begin{array}{ll}z_{1} & z_{2}\end{array}\right)^{\mathrm{T}}\) of \(\boldsymbol{f}(\boldsymbol{z})=\left(f_{1}\left(z_{1}, z_{2}\right) f_{2}\left(z_{1}, z_{2}\right)\right)^{\mathrm{T}}\) such that \(\boldsymbol{f}(\boldsymbol{Z})=\overline{\mathbf{0 .} \text {. We }}\) can visualize \(\boldsymbol{Z}\) as the intersections of the ellipses \(f_{1}=0\) (the intersection of paraboloid \(f_{1}\) with the zero plane) and \(f_{2}=0\) (the intersection of paraboloid \(f_{2}\) with the zero plane). The four solutions \(\left(Z_{1}, Z_{2}\right)=(\pm 2, \pm 3)\) are shown as the red circles in the contour plot of Figure \(29.6\).

    Method

    The Newton method for the multivariate case follows the same approach as for the univariate case:

    • Start with an initial approximation \(\hat{z}^{0}\) of a root to the nonlinear system \(\boldsymbol{f}(\boldsymbol{z})=\mathbf{0}\).
    • Linearize the system at \(\hat{z}^{0}\) and solve the resulting linear system to derive a better approximation \(\hat{\boldsymbol{z}}^{1}\).
    • Continue linearizing and solving until the norm of \(\boldsymbol{f}\left(\hat{\boldsymbol{z}}^{N}\right)\) is within some desired tolerance of zero.

    However, the multivariate case can be much more challenging computationally.

    Screen Shot 2022-03-28 at 10.08.57 PM.png
    Figure 29.5: Elliptic Paraboloids \(f_{1}\) and \(f_{2}\).
    Screen Shot 2022-03-28 at 10.09.33 PM.png
    Figure 29.6: Contour plots of \(f_{1}\) and \(f_{2}\) with their intersections of the zero contours (the solutions to \(\boldsymbol{f}(\boldsymbol{z})=\mathbf{0}\) ) shown as red circles.

    We present the method for the general case in which \(\boldsymbol{z}\) is an \(n\)-vector, \(\left(\begin{array}{llll}z_{1} & z_{2} & \cdots & z_{n}\end{array}\right)^{\mathrm{T}}\) and \(\boldsymbol{f}(\boldsymbol{z})\) is also an \(n\)-vector, \(\left(f_{1}(\boldsymbol{z}) \quad f_{2}(\boldsymbol{z}) \cdot f_{n}(\boldsymbol{z})\right)^{\mathrm{T}}\). This represents \(n\) nonlinear equations in \(n\) unknowns. (Of course, even more generally, we could consider more equations than unknowns or less equations than unknowns.) To linearize the multivariate system, we again use a first-order Taylor series expansion, which, for a single multivariate function linearized at the point \(\hat{\boldsymbol{z}}^{k}\), is given by \[\left.f_{\text {linear }}^{k}(\boldsymbol{z}) \equiv \frac{\partial f}{\partial z_{1}}\right|_{\hat{\boldsymbol{z}}^{k}}\left(z_{1}-\hat{z}_{1}^{k}\right)+\left.\frac{\partial f}{\partial z_{2}}\right|_{\hat{\boldsymbol{z}}^{k}}\left(z_{2}-\hat{z}_{2}^{k}\right)+\cdots+\left.\frac{\partial f}{\partial z_{n}}\right|_{\hat{\boldsymbol{z}}^{k}}\left(z_{n}-\hat{z}_{n}^{k}\right)+f\left(\hat{\boldsymbol{z}}^{k}\right)\] (cf. equation (29.4) in the univariate case). Because we have \(n\) equations in \(n\) variables, we must linearize each of the equations \(\left(\begin{array}{llll}f_{1}(\boldsymbol{z}) & f_{2}(\boldsymbol{z}) & \ldots & \left.f_{n}(\boldsymbol{z})\right)^{\mathrm{T}} \text { and then, per the Newton recipe, set }\end{array}\right.\) \(\left(f_{1}\left(\hat{\boldsymbol{z}}^{k}\right)=0 \quad \ldots \quad f_{n}\left(\hat{\boldsymbol{z}}^{k}\right)=0\right)^{\mathrm{T}}\). Our full linearized system looks like \[\begin{aligned} &\left.f_{1, \text { linear }}^{k}\left(\hat{\boldsymbol{z}}^{k+1}\right) \equiv \frac{\partial f_{1}}{\partial z_{1}}\right|_{\hat{\boldsymbol{z}}^{k}}\left(\hat{z}_{1}^{k+1}-\hat{z}_{1}^{k}\right)+\left.\frac{\partial f_{1}}{\partial z_{2}}\right|_{\hat{\boldsymbol{z}}^{k}}\left(\hat{z}_{2}^{k+1}-\hat{z}_{2}^{k}\right)+\cdots+\left.\frac{\partial f_{1}}{\partial z_{n}}\right|_{\hat{\boldsymbol{z}}^{k}}\left(\hat{z}_{n}^{k+1}-\hat{z}_{n}^{k}\right)+f_{1}\left(\hat{\boldsymbol{z}}^{k}\right)=0, \\ &\left.f_{2, \text { linear }}^{k}\left(\hat{z}^{k+1}\right) \equiv \frac{\partial f_{2}}{\partial z_{1}}\right|_{\hat{\boldsymbol{z}}^{k}}\left(\hat{z}_{1}^{k+1}-\hat{z}_{1}^{k}\right)+\left.\frac{\partial f_{2}}{\partial z_{2}}\right|_{\hat{\boldsymbol{z}}^{k}}\left(\hat{z}_{2}^{k+1}-\hat{z}_{2}^{k}\right)+\cdots+\left.\frac{\partial f_{2}}{\partial z_{n}}\right|_{\hat{\boldsymbol{z}}^{k}}\left(\hat{z}_{n}^{k+1}-\hat{z}_{n}^{k}\right)+f_{2}\left(\hat{z}^{k}\right)=0, \end{aligned}\] up through

    \(\left.f_{n, \text { linear }}^{k}\left(\hat{z}^{k+1}\right) \equiv \frac{\partial f_{n}}{\partial z_{1}}\right|_{\hat{z}^{k}}\left(\hat{z}_{1}^{k+1}-\hat{z}_{1}^{k}\right)+\left.\frac{\partial f_{n}}{\partial z_{2}}\right|_{\hat{z}^{k}}\left(\hat{z}_{2}^{k+1}-\hat{z}_{2}^{k}\right)+\cdots+\left.\frac{\partial f_{n}}{\partial z_{n}}\right|_{\hat{z}^{k}}\left(\hat{z}_{n}^{k+1}-\hat{z}_{n}^{k}\right)+f_{n}\left(\hat{z}^{k}\right)=0\)

    (cf. equation (29.5) in the univariate case).

    We can write the linear system of equations (29.36) -(29.38) in matrix form as \[\left.f_{\text {linear }}^{k}\left(\hat{z}^{k+1}\right) \equiv\left[\begin{array}{cccc} \frac{\partial f_{1}}{\partial z_{1}} & \frac{\partial f_{1}}{\partial z_{2}} & \cdots & \frac{\partial f_{1}}{\partial z_{n}} \\ \frac{\partial f_{2}}{\partial z_{1}} & \frac{\partial f_{2}}{\partial z_{2}} & \cdots & \frac{\partial f_{2}}{\partial z_{n}} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial f_{n}}{\partial z_{1}} & \frac{\partial f_{n}}{\partial z_{2}} & \cdots & \frac{\partial f_{n}}{\partial z_{n}} \end{array}\right]\right|_{\hat{z}^{k}}\left[\begin{array}{c} \left(\hat{z}_{1}^{k+1}-\hat{z}_{1}^{k}\right) \\ \left(\hat{z}_{2}^{k+1}-\hat{z}_{2}^{k}\right) \\ \vdots \\ \left(\hat{z}_{n}^{k+1}-\hat{z}_{n}^{k}\right) \end{array}\right]+\left[\begin{array}{c} f_{1}\left(\hat{z}^{k}\right) \\ f_{2}\left(\hat{z}^{k}\right) \\ \vdots \\ f_{n}\left(\hat{z}^{k}\right) \end{array}\right]=\left[\begin{array}{c} 0 \\ 0 \\ \vdots \\ 0 \end{array}\right]\] or, equivalently, \[\boldsymbol{J}\left(\hat{\boldsymbol{z}}^{k}\right) \delta \hat{\boldsymbol{z}}^{k}=-\boldsymbol{f}\left(\hat{\boldsymbol{z}}^{k}\right) .\] Here the \(n \times n\) Jacobian matrix \(\boldsymbol{J}\) is defined as the matrix of all first-order partial derivatives of the function vector \(\left(\begin{array}{lllll}f_{1} & \ldots & f_{n}\end{array}\right)^{\mathrm{T}}\) with respect to the state vector \(\left(\begin{array}{llll}z_{1} & \ldots & z_{n}\end{array}\right)^{\mathrm{T}}\) : \[\boldsymbol{J}(\boldsymbol{z})=\left.\left[\begin{array}{cccc} \frac{\partial f_{1}}{\partial z_{1}} & \frac{\partial f_{1}}{\partial z_{2}} & \cdots & \frac{\partial f_{1}}{\partial z_{n}} \\ \frac{\partial f_{2}}{\partial z_{1}} & \frac{\partial f_{2}}{\partial z_{2}} & \cdots & \frac{\partial f_{2}}{\partial z_{n}} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial f_{n}}{\partial z_{1}} & \frac{\partial f_{n}}{\partial z_{2}} & \cdots & \frac{\partial f_{n}}{\partial z_{n}} \end{array}\right]\right|_{\boldsymbol{z}}\] such that the \(i, j^{\text {th }}\) component of the Jacobian corresponds to the partial derivative of the \(i^{\text {th }}\) function with respect to the \(j^{\text {th }}\) variable, \[J_{i j}(\boldsymbol{z})=\frac{\partial f_{i}}{\partial z_{j}}(\boldsymbol{z}) .\] Thus \(\boldsymbol{J}\left(\hat{\boldsymbol{z}}^{k}\right)\) denotes the Jacobian matrix for the system evaluated at the point \(\hat{\boldsymbol{z}}^{k}\). Note also that \(\delta \hat{\boldsymbol{z}}^{k}\) is the displacement vector pointing from the current approximation \(\hat{\boldsymbol{z}}^{k}\) to the next approximation \(\hat{z}^{k+1}\) \[\delta \hat{z}^{k}=\left[\begin{array}{c} \left(\hat{z}_{1}^{k+1}-\hat{z}_{1}^{k}\right) \\ \left(\hat{z}_{2}^{k+1}-\hat{z}_{2}^{k}\right) \\ \vdots \\ \left(\hat{z}_{n}^{k+1}-\hat{z}_{n}^{k}\right) \end{array}\right] .\] Hence \(\delta \hat{\boldsymbol{z}}^{k}\) is the Newton update to our current iterate.

    Example

    We can now apply Newton to solve the \(n=2\) model problem described in Section 29.3.1: \[\begin{aligned} &f_{1}\left(z_{1}, z_{2}\right)=z_{1}^{2}+2 z_{2}^{2}-22=0, \\ &f_{2}\left(z_{1}, z_{2}\right)=2 z_{1}^{2}+z_{2}^{2}-17=0 . \end{aligned}\] We first compute the elements of the Jacobian matrix as \[J_{11}=\frac{\partial f_{1}}{\partial z_{1}}=2 z_{1}, \quad J_{12}=\frac{\partial f_{1}}{\partial z_{2}}=4 z_{2}, \quad J_{21}=\frac{\partial f_{2}}{\partial z_{1}}=4 z_{1}, \quad J_{22}=\frac{\partial f_{2}}{\partial z_{2}}=2 z_{2},\] and write the full matrix as \[\boldsymbol{J}(\boldsymbol{z})=\left[\begin{array}{ll} J_{11} & J_{12} \\ J_{21} & J_{22} \end{array}\right]=\left[\begin{array}{cc} 2 z_{1} & 4 z_{2} \\ 4 z_{1} & 2 z_{2} \end{array}\right]\] We can now perform the iteration.

    We start with initial guess \(\hat{\boldsymbol{z}}^{0}=\left(\begin{array}{lll}10 & 10\end{array}\right)^{\mathrm{T}}\). We next linearize around \(\hat{\boldsymbol{z}}^{0}\) to get \[\boldsymbol{f}_{\text {linear }}^{0}(\boldsymbol{z}) \equiv \boldsymbol{J}\left(\hat{\boldsymbol{z}}^{0}\right)\left(\boldsymbol{z}-\hat{\boldsymbol{z}}^{0}\right)+\boldsymbol{f}\left(\hat{\boldsymbol{z}}^{0}\right)=\left[\begin{array}{cc} 20 & 40 \\ 40 & 20 \end{array}\right] \delta \hat{\boldsymbol{z}}^{0}+\left[\begin{array}{l} 278 \\ 283 \end{array}\right]\] Note that we can visualize this system as two planes tangent to \(f_{1}\) and \(f_{2}\) at \(\hat{\boldsymbol{z}}^{0}\). We now solve the linearized system \[\boldsymbol{f}_{\text {linear }}^{0}\left(\hat{\boldsymbol{z}}^{1}\right) \equiv\left[\begin{array}{cc} 20 & 40 \\ 40 & 20 \end{array}\right] \delta \hat{\boldsymbol{z}}^{0}+\left[\begin{array}{c} 278 \\ 283 \end{array}\right]=\left[\begin{array}{l} 0 \\ 0 \end{array}\right]\] or \[\left[\begin{array}{cc} 20 & 40 \\ 40 & 20 \end{array}\right] \delta \boldsymbol{z}^{0}=\left[\begin{array}{c} -278 \\ -283 \end{array}\right]\] to find \[\delta \hat{\boldsymbol{z}}^{0}=\left[\begin{array}{c} -4.8 \\ -4.55 \end{array}\right]\] thus the next approximation for the root of \(\boldsymbol{f}(z)\) is given by \[\hat{\boldsymbol{z}}^{1}=\hat{\boldsymbol{z}}^{0}+\delta \hat{\boldsymbol{z}}^{0}=\left[\begin{array}{l} 5.2 \\ 5.45 \end{array}\right] .\] We repeat the procedure to find

    We see that the solution rapidly approaches the (nearest) exact solution \(\boldsymbol{Z}=\left(\begin{array}{ll}2 & 3\end{array}\right)^{\mathrm{T}}\).

    Algorithm

    The multivariate Newton algorithm is identical to the univariate algorithm, except that now for each pass through the while loop we must now solve a linearized system of equations involving the Jacobian matrix.

    Screen Shot 2022-03-28 at 10.12.42 PM.png

    We can see that the computational cost for the multivariate Newton iteration is essentially the total number of iterations multiplied by the cost to solve an \(n \times n\) linear system - which, if dense, \[\begin{aligned} & \boldsymbol{f}_{\text {linear }}^{1}(\boldsymbol{z}) \equiv \boldsymbol{J}\left(\hat{\boldsymbol{z}}^{1}\right)\left(\boldsymbol{z}-\hat{\boldsymbol{z}}^{1}\right)+\boldsymbol{f}\left(\hat{\boldsymbol{z}}^{1}\right)=\left[\begin{array}{cc}10.4 & 21.8 \\20.8 & 10.9\end{array}\right] \delta \hat{\boldsymbol{z}}^{1}+\left[\begin{array}{c}64.445 \\66.7825\end{array}\right], \\ & \boldsymbol{f}_{\text {linear }}^{1}\left(\hat{\boldsymbol{z}}^{2}\right)=\mathbf{0} \\ & \hat{\boldsymbol{z}}^{2}=\left[\begin{array}{c}2.9846 \\3.5507\end{array}\right] ; \\ & \boldsymbol{f}_{\text {linear }}^{2}(\boldsymbol{z})=\equiv \boldsymbol{J}\left(\hat{\boldsymbol{z}}^{2}\right)\left(\boldsymbol{z}-\hat{\boldsymbol{z}}^{2}\right)+\boldsymbol{f}\left(\hat{\boldsymbol{z}}^{2}\right)=\left[\begin{array}{cc}5.9692 & 14.2028 \\11.9385 & 7.1014\end{array}\right] \delta \hat{\boldsymbol{z}}^{2}+\left[\begin{array}{l}12.1227 \\13.4232\end{array}\right] \text {, } \\ & \boldsymbol{f}_{\text {linear }}^{2}\left(\hat{\boldsymbol{z}}^{3}\right)=\mathbf{0} \\ & \hat{\boldsymbol{z}}^{3}=\left[\begin{array}{l}2.1624 \\3.0427\end{array}\right] . \end{aligned}\] could be as much as \(\mathcal{O}\left(n^{3}\right)\) and, if sparse, as little as \(\mathcal{O}(n)\) operations. Additionally, for "pure" Newton, the Jacobian needs to be recomputed \(\left(\mathcal{O}\left(n^{2}\right)\right.\) operations) at each iteration. (In "impure" Newton the Jacobian sometimes is held fixed for several iterations or updated selectively.)

    Note that, as before in the univariate case, we can substitute a finite difference approximation for the Jacobian if we can not (or choose not to) use the analytical expressions. Here we first introduce a scalar \(\Delta z\) (small); note that \(\Delta z\) is not related to \(\delta \boldsymbol{z}\) - i.e., this is not a secant approach. Then we approximate, for \(1 \leq i \leq n, 1 \leq j \leq n\), \[J_{i j}(\boldsymbol{z}) \equiv \frac{\partial f_{i}}{\partial z_{j}}(\boldsymbol{z}) \approx \frac{f_{i}\left(\boldsymbol{z}+\Delta z \boldsymbol{e}^{j}\right)-f_{i}(\boldsymbol{z})}{\Delta z} \equiv \widetilde{J}_{i j}(\boldsymbol{z}),\] where \(e^{j}\) is the unit vector in the \(j\)-direction such that \(z+\Delta z e^{j}\) differs from \(z\) in only the \(j^{\text {th }}\) component - a partial difference approximation to the partial derivative \(\frac{\partial f_{i}}{\partial z_{j}}\). Note that to compute the full finite difference approximation \(n \times n\) matrix \(\widetilde{\boldsymbol{J}}(\boldsymbol{z})\) we need \(\mathcal{O}\left(n^{2}\right)\) function evaluations.

    Comments on Multivariate Newton

    For multivariate Newton, both the convergence rate and the pathologies are similar to the univariate case, although the pathologies are somewhat more likely in the multivariate case given the greater number of degrees of freedom.

    The main difference for the multivariate case, then, is the relative cost of each Newton iteration (worst case \(\mathcal{O}\left(n^{3}\right)\) operations) owing to the size of the linear system which must be solved. For this reason, Newton’s rapid convergence becomes more crucial with growing dimensionality. Thanks to the rapid convergence, Newton often outperforms "simpler" approaches which are less computationally expensive per iteration but require many more iterations to converge to a solution.


    This page titled 29.3: Multivariate Newton is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by Masayuki Yano, James Douglass Penn, George Konidaris, & Anthony T Patera (MIT OpenCourseWare) via source content that was edited to the style and standards of the LibreTexts platform.