International
Tables for
Crystallography
Volume C
Mathematical, physical and chemical tables
Edited by E. Prince

International Tables for Crystallography (2006). Vol. C. ch. 8.1, p. 683

Section 8.1.4.1. The Gauss–Newton algorithm

E. Princea and P. T. Boggsb

a NIST Center for Neutron Research, National Institute of Standards and Technology, Gaithersburg, MD 20899, USA, and bScientific Computing Division, National Institute of Standards and Technology, Gaithersburg, MD 20899, USA

8.1.4.1. The Gauss–Newton algorithm

| top | pdf |

Let [{\bf x}_c] be the current approximation to [\widehat {{\bf x}}], the solution to (8.1.4.1)[link]. We construct a linear approximation to [M_i({\bf x})] in the vicinity of [{\bf x}={\bf x}_c] by expanding it in a Taylor series through the linear terms, obtaining [M_i({\bf x})=M_i({\bf x}_c)+\textstyle\sum\limits_{j=1}^pJ_{ij}({\bf x}-{\bf x}_c)_j, \eqno (8.1.4.2)]where [{\bi J}] is the Jacobian matrix, defined by [J_{ij}=\partial M_i({\bf x})/\partial x_j]. A straightforward procedure, known as the Gauss–Newton algorithm, may be formally stated as follows:

  • (1) compute d as the solution of the linear system [{\bi J}^{T}{\bi WJ}{\bf d}={\bi J}^{T}{\bi W}[{\bf y}-{\bi M}({\bf x}_{c})];]

  • (2) set xc = xc+ d;

  • (3) if not converged (see Subsection 8.1.4.4[link]), go to (1), else stop.

The convergence rate of the Gauss–Newton algorithm depends on the size of the residual, that is, on [S(\widehat {{\bf x}})]. If [S(\widehat {{\bf x}})=0], then the convergence rate is quadratic; if it is small, then the rate is linear; but if [S(\widehat {{\bf x}})] is large, then the procedure is not locally convergent at all. Fortunately, this procedure can be altered so that it is always locally convergent, and even globally convergent, that is, convergent to a relative minimum from any starting point. There are two possibilities. First, the procedure can be modified to include a line search. This is accomplished by computing the direction d as above and then choosing α such that S(xc + αd) is sufficiently smaller than [S({\bf x}_c)]. In order to guarantee convergence, one uses the test [S({\bf x}_c+\alpha {\bf d}) \, \lt \, S({\bf x}_c)-\alpha \gamma {\bf d}^T{\bi J}({\bf x}_c)^T[{\bf y}-{\bi M}({\bf x}_c)], \eqno (8.1.4.3)]where, as actually implemented in modern codes, γ has values of the order of 10−4. [In theory, a slightly stronger condition is necessary, but this is usually avoided by other means. See Dennis & Schnabel (1983[link]) for details.] While this improves the situation dramatically, it still suffers from the deficiency that it is very slow on some problems, and it is undefined if the rank of the Jacobian, [{\bi J}(\widehat {{\bf x}})], is less than p. J(x) usually has rank p near the solution, but it may be rank deficient far away. Also, it may be `close' to rank deficient, and give numerical difficulties (see Subsection 8.1.3.3[link]).

References

First citation Dennis, J. E. & Schnabel, R. B. (1983). Numerical methods for unconstrained optimization and nonlinear equations. Englewood Cliffs, NJ: Prentice Hall.Google Scholar








































to end of page
to top of page