The Gauss–Newton algorithm

Prince, E.; Boggs, P. T.

doi:10.1107/97809553602060000609

International
Tables for
Crystallography
Volume C
Mathematical, physical and chemical tables
Edited by E. Prince

pdf | chapter contents | chapter index | related articles

International Tables for Crystallography (2006). Vol. C. ch. 8.1, p. 683

Section 8.1.4.1. The Gauss–Newton algorithm

E. Prince^a and P. T. Boggs^b

^a NIST Center for Neutron Research, National Institute of Standards and Technology, Gaithersburg, MD 20899, USA, and ^bScientific Computing Division, National Institute of Standards and Technology, Gaithersburg, MD 20899, USA

8.1.4.1. The Gauss–Newton algorithm

| top | pdf |

Let $[{\bf x}_c]$ be the current approximation to $[\widehat {{\bf x}}]$ , the solution to (8.1.4.1). We construct a linear approximation to $[M_i({\bf x})]$ in the vicinity of $[{\bf x}={\bf x}_c]$ by expanding it in a Taylor series through the linear terms, obtaining $[M_i({\bf x})=M_i({\bf x}_c)+\textstyle\sum\limits_{j=1}^pJ_{ij}({\bf x}-{\bf x}_c)_j, \eqno (8.1.4.2)]$ where $[{\bi J}]$ is the Jacobian matrix, defined by $[J_{ij}=\partial M_i({\bf x})/\partial x_j]$ . A straightforward procedure, known as the Gauss–Newton algorithm, may be formally stated as follows:

(1) compute d as the solution of the linear system $[{\bi J}^{T}{\bi WJ}{\bf d}={\bi J}^{T}{\bi W}[{\bf y}-{\bi M}({\bf x}_{c})];]$
(2) set x_c = x_c+ d;
(3) if not converged (see Subsection 8.1.4.4), go to (1), else stop.

The convergence rate of the Gauss–Newton algorithm depends on the size of the residual, that is, on $[S(\widehat {{\bf x}})]$ . If $[S(\widehat {{\bf x}})=0]$ , then the convergence rate is quadratic; if it is small, then the rate is linear; but if $[S(\widehat {{\bf x}})]$ is large, then the procedure is not locally convergent at all. Fortunately, this procedure can be altered so that it is always locally convergent, and even globally convergent, that is, convergent to a relative minimum from any starting point. There are two possibilities. First, the procedure can be modified to include a line search. This is accomplished by computing the direction d as above and then choosing α such that S(x_c + αd) is sufficiently smaller than $[S({\bf x}_c)]$ . In order to guarantee convergence, one uses the test $[S({\bf x}_c+\alpha {\bf d}) \, \lt \, S({\bf x}_c)-\alpha \gamma {\bf d}^T{\bi J}({\bf x}_c)^T[{\bf y}-{\bi M}({\bf x}_c)], \eqno (8.1.4.3)]$ where, as actually implemented in modern codes, γ has values of the order of 10⁻⁴. [In theory, a slightly stronger condition is necessary, but this is usually avoided by other means. See Dennis & Schnabel (1983) for details.] While this improves the situation dramatically, it still suffers from the deficiency that it is very slow on some problems, and it is undefined if the rank of the Jacobian, $[{\bi J}(\widehat {{\bf x}})]$ , is less than p. J(x) usually has rank p near the solution, but it may be rank deficient far away. Also, it may be `close' to rank deficient, and give numerical difficulties (see Subsection 8.1.3.3).

References

Dennis, J. E. & Schnabel, R. B. (1983). Numerical methods for unconstrained optimization and nonlinear equations. Englewood Cliffs, NJ: Prentice Hall.Google Scholar

International Tables for Crystallography (2006). Vol. C. ch. 8.1, p. 683