Stopping rules

Prince, E.; Boggs, P. T.

doi:10.1107/97809553602060000609

International
Tables for
Crystallography
Volume C
Mathematical, physical and chemical tables
Edited by E. Prince

pdf | chapter contents | chapter index | related articles

International Tables for Crystallography (2006). Vol. C. ch. 8.1, pp. 684-685

Section 8.1.4.4. Stopping rules

E. Prince^a and P. T. Boggs^b

^a NIST Center for Neutron Research, National Institute of Standards and Technology, Gaithersburg, MD 20899, USA, and ^bScientific Computing Division, National Institute of Standards and Technology, Gaithersburg, MD 20899, USA

8.1.4.4. Stopping rules

| top | pdf |

An iterative algorithm must contain some criterion for termination of the iteration process. This is a delicate part of all nonlinear optimization codes, and depends strongly on the scaling of the parameters. Although exceptions exist to almost all reasonable scaling rules, a basic principle is that a unit change in any variable should have approximately the same effect on the sum of squares. Thus, as discussed in Subsection 8.1.3.3 (equation 8.1.3.9), the ideal unit for each parameter is the standard uncertainty of its estimate, which can usually be adequately approximated by the reciprocal of the column norm of J. In modern codes, the user has the option of supplying a diagonal scaling matrix whose elements are the reciprocals of some estimate of a typical `significant' shift in the corresponding parameter.

In principle, the following conditions should hold when the convergence of a well scaled, least-squares procedure has reached its useful limit:

(1) $[S({\bf x}_{c})=S(\widehat {{\bf x}})]$ ;
(2) $[{\bi J}({\bf x}_{c})^{T}{\bi W}[{\bf y}-{\bi M}({\bf x}_{c})]]$ = O;
(3) $[{\bf x}_{c}=\widehat {{\bf x}}]$ .

The actual stopping rules must be chosen relative to the algorithm used (other conditions also exist) and the particular application. It is clear, however, that, since $[\widehat {{\bf x}}]$ is not known, the last two conditions cannot be used as stated. Also, these tests are dependent on the scaling of the problem, and the variables are not related to the sizes of the quantities involved. We present tests, therefore, that are relative error tests that take into account the scaling of the variables.

The general, relative error test is stated as follows. Two scalar quantities, a and b, are said to satisfy a relative error test with respect to a tolerance T if $[{|a-b| \over |a|} \leq T. \eqno (8.1.4.15)]$ Roughly speaking, if T is of the form $[10^{-q}]$ then a and b agree to q digits. Obviously, there is a problem with this test if [a=0] and there will be numerical difficulties if a is close to 0. Thus, in practice, (8.1.4.15) is replaced by $[ |a-b| \leq (|a|+1)T, \eqno (8.1.4.16)]$ which reduces to an absolute error test as $[a \rightarrow 0]$ . A careful examination may be required to set this tolerance correctly, but, typically, if one of the fast, stable algorithms is used, only a few more iterations are necessary to get six or eight digits if one or two are already known. Note also that the actual value depends on $[\varepsilon ]$ , the relative machine precision. It is fruitless to seek more digits of accuracy than are expressed in the machine representation.

A test based on condition (1) is often implemented by using the linear approximation to M or the quadratic approximation to S. Thus, using the quadratic approximation to S, we can compute the predicted reduction by $[\Delta_{{\rm pred}}=S({\bf x}_c)-{\bf d}^T{\bi J}^T{\bi W}[{\bf y}-{\bi M}({\bf x}_c)]. \eqno (8.1.4.17)]$ Similarly, the actual reduction is $[\Delta_{{\rm act}}=S({\bf x}_c)-S({\bf x}_{+}). \eqno (8.1.4.18)]$ The test then becomes $[\Delta_{{\rm pred}}\leq [1+S({\bf x}_c)]T]$ , $[\Delta_{{\rm act}}\leq]$ $[[1+S({\bf x}_c)]T]$ , and $[\Delta_{{\rm act}}\leq 2\Delta_{{\rm pred}}]$ . That is, we want both the predicted and actual reductions to be small and the actual reduction to agree reasonably well with the predicted reduction. A typical value for T should be 10⁻⁴, although the value again depends on $[\varepsilon]$ , and the user is cautioned not to make this tolerance too loose.

For a test on condition (2), we compute the cosine of the angle between the vector of residuals and the linear subspace spanned by the columns of J, $[\cos \zeta =\{{\bf d}^T{\bi J}^T{\bi W}[{\bf y}-{\bi M}({\bf x}_{+})]\}/[({\bf d}^T{\bi J}^T{\bi WJ}{\bf d})S({\bf x}_{+})]^{1/2}. \eqno (8.1.4.19)]$ The test is $[\cos \zeta \leq T]$ , where, again, T should be 10⁻⁴ or smaller.

Test 3 above is usually only present to prevent the process from continuing when almost nothing is happening. Clearly, we do not know $[\widehat {{\bf x}}]$ , thus the test is typically that corresponding elements of $[{\bf x}_{+}]$ and $[{\bf x}_c]$ satisfy (8.1.4.16), where T is chosen to be 10^−q. A recommended value of q is half the number of digits carried in the computation, e.g. q = 8 for standard 64-bit (double-precision or 16 digit) calculations. Sometimes, the relative error test is of the form $[|({\bf x}_{+})_j-({\bf x}_c)_j|/\sigma _j \leq T]$ , where σ_j is the standard uncertainty computed from the inverse Hessian in the last iteration. Although this test has some statistical validity, it is quite expensive and usually not worth the work involved to compute.

References

International Tables for Crystallography (2006). Vol. C. ch. 8.1, pp. 684-685