Least-squares solution to the system of nonlinear constraint equations

Zhang, K. Y. J.; Cowtan, K. D.; Main, P.

doi:10.1107/97809553602060000687

International
Tables for
Crystallography
Volume F
Crystallography of biological macromolecules
Edited by M. G. Rossmann and E. Arnold

pdf | chapter contents | chapter index | related articles

International Tables for Crystallography (2006). Vol. F. ch. 15.1, pp. 321-323 | 1 | 2 |

Section 15.1.5.2. Least-squares solution to the system of nonlinear constraint equations

K. Y. J. Zhang,^a K. D. Cowtan^b ^* and P. Main^c

^a Division of Basic Sciences, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave N., Seattle, WA 90109, USA,^bDepartment of Chemistry, University of York, York YO1 5DD, England, and ^cDepartment of Physics, University of York, York YO1 5DD, England
Correspondence e-mail: cowtan+email@ysbl.york.ac.uk

15.1.5.2. Least-squares solution to the system of nonlinear constraint equations

| top | pdf |

For a system of nonlinear equations of electron density, $[{\bf F}\left(\rho({\bf x})\right) = {\bf 0}, \eqno(15.1.5.3)]$ where $[\eqalign{{\bf F}\left({\rho({\bf x})}\right) &= \left[F_1 \left(\rho({\bf x})\right) \ F_2 \left(\rho({\bf x})\right) \ \ldots \ F_m \left(\rho({\bf x})\right) \right]^T,\cr \rho({\bf x}) &= \left[\rho_1 \ \rho_2 \ \ldots \ \rho_n\right]^T,}]$ 0 is a null vector, n is the number of grid points and m is the number of equations, the Newton–Raphson method of solution is to find a set of shifts, $[\delta\rho\left({\bf x}\right)]$ to $[\rho\left({\bf x}\right)]$ , through a system of linear equations, $[{\bf J}\delta\rho\left({\bf x}\right) = - \varepsilon, \eqno(15.1.5.4)]$ where J is a matrix of partial derivatives of F with respect to $[\rho\left({\bf x}\right)]$ and is called the Jacobian matrix, $[{\bf J} = \left[ {\openup 6pt\matrix{{\displaystyle{{\partial F_1 } \over {\partial \rho_1 }}} & {\displaystyle{{\partial F_1 } \over {\partial \rho_2 }}} & \cdots & {\displaystyle{{\partial F_1 } \over {\partial \rho_n }}}\cr {\displaystyle{{\partial F_2 } \over {\partial \rho_1 }}} & {\displaystyle{{\partial F_2 } \over {\partial \rho_2 }}} & \cdots & {\displaystyle{{\partial F_2 } \over {\partial \rho_n }}}\cr\noalign{\vskip-15pt}\cr \vdots & \vdots & \ddots & \vdots\cr {\displaystyle{{\partial F_m } \over {\partial \rho_1 }}} & {\displaystyle{{\partial F_m } \over {\partial \rho_2 }}} & \cdots & {\displaystyle{{\partial F_m } \over {\partial \rho_n }}}\cr}} \right], \eqno(15.1.5.5)]$ ɛ is a vector of residuals to equation (15.1.5.3) for a trial solution, $[\rho\left({\bf x}\right)]$ , and $[\delta\rho\left({\bf x}\right)]$ is a vector of shifts to the density. Hence, the solution for $[\rho\left({\bf x}\right)]$ is achieved in an iterative manner, $[\rho^{i + 1} \left({\bf x}\right) = \rho^i \left({\bf x}\right) + \delta\rho\left({\bf x}\right). \eqno(15.1.5.6)]$ Therefore, the problem of solving a system of nonlinear equations (15.1.5.3) is transformed into solving a system of linear equations (15.1.5.4), which forms one cycle of Newton–Raphson iteration.

If there are more equations than unknowns $[(m \gt n)]$ , the unknowns are obtained through a least-squares solution to equations (15.1.5.4), $[{\bf J}^T {\bf J}\delta\rho\left({\bf x}\right) = - {\bf J}^T \varepsilon. \eqno(15.1.5.7)]$ Theoretically, the above system of equations could be solved by matrix multiplication and inversion, i.e. $[\delta\rho\left({\bf x}\right) = - \left({{\bf J}^T {\bf J}}\right)^{ - 1} {\bf J}^T \varepsilon. \eqno(15.1.5.8)]$ However, the amount of calculation involved in setting up the normal matrix of least squares is huge for the problem presented by protein structures. This can be completely avoided by using the conjugate-gradient technique for solving the system of linear equations.

15.1.5.2.1. The conjugate-gradient method

| top | pdf |

The conjugate-gradient method does not require the inversion of the normal matrix, and therefore the solution to a large system of linear equations can be achieved very quickly.

Starting from a trial solution to equations (15.1.5.4), such as a null vector, $[\delta \rho_0 \left({\bf x}\right) = {\bf 0}, \eqno(15.1.5.9)]$ the initial residual is $[{\bf r}_0 = - {\bf J}^T \left(\varepsilon - {\bf J}\delta \rho_0 \left({\bf x}\right)\right) \eqno(15.1.5.10)]$ and the initial search step is $[{\bf p}_0 = {\bf r}_0. \eqno(15.1.5.11)]$

The iterative process is as follows. The new shift to the density is $[\delta \rho_{k + 1} \left({\bf x}\right) = \delta \rho_k \left({\bf x}\right) + \alpha _k {\bf p}_k, \eqno(15.1.5.12)]$ where $[\alpha _k = {\bf r}_k^T {\bf p}_k / {\bf q}_k^T {\bf q}_k \eqno(15.1.5.13)]$ and $[{\bf q}_k = {\bf Jp}_k. \eqno(15.1.5.14)]$ The new residual is $[{\bf r}_{k + 1} = {\bf r}_k - \alpha _k {\bf s}_k, \eqno(15.1.5.15)]$ where $[{\bf s}_k = {\bf J}^T {\bf q}_k. \eqno(15.1.5.16)]$ The next search step which conjugates with the residual is $[{\bf p}_{k + 1} = {\bf r}_{k + 1} + \beta _k {\bf p}_k, \eqno(15.1.5.17)]$ where $[\beta _k = - {\bf r}_{k + 1}^T {\bf s}_k / {\bf q}_k^T {\bf q}_k. \eqno(15.1.5.18)]$

The process is iterated by increasing k until convergence is reached, when $[\left| {{\bf r}_{k + 1} - {\bf r}_k } \right| \Rightarrow 0.]$

The number of iterations required for an exact solution is equal to the number of unknowns, because the search vector at each step is orthogonal with all the previous steps. However, a very satisfactory solution can normally be reached after very few iterations. This makes the conjugate-gradient method a very efficient and fast procedure for solving a system of equations. Note that the normal matrix never appears explicitly, although it is implicit in (15.1.5.10) and (15.1.5.16). The inversion of the normal matrix and matrix multiplication is completely avoided. Most of the calculation comes from the formation of the matrix-vector products in (15.1.5.10), (15.1.5.14), and (15.1.5.16). These can be expressed as convolutions and can be performed using FFTs, thus saving considerably more time.

The solution to $[\delta\rho\left({\bf x}\right)]$ at the end of conjugate-gradient iteration is substituted into equation (15.1.5.6) to get a new solution for $[\rho\left({\bf x}\right)]$ . The solution to the system of nonlinear equations (15.1.5.3) is obtained when the Newton–Raphson iteration has reached convergence.

15.1.5.2.2. The full-matrix solution

| top | pdf |

The equations to be solved for the electron-density shifts, $[\delta\rho\left({\bf x}\right)]$ , are from the Jacobian of equation (15.1.5.2), $[\cases{(2V/N){\textstyle\sum\limits_{\bf y}} \rho\left({\bf y}\right)\psi \left({\bf x} - {\bf y}\right) - \delta\rho\left({\bf x}\right) = \Delta\rho\left({\bf x}\right)\cr \delta\rho\left({\bf x}\right) = \Delta H\left({\bf x}\right)\hfill\cr}, \eqno(15.1.5.19)]$ where $[\Delta\rho\left({\bf x}\right)]$ is the residual to Sayre's equation, $[\Delta\rho({\bf x}) = \rho({\bf x}) - (V/N){\textstyle\sum\limits_{\bf y}} \rho^{2} ({\bf y})\psi ({\bf x} - {\bf y}), \eqno(15.1.5.20)]$ and $[\Delta H\left({\bf x}\right)]$ is the residual to the linear density-modification equations, $[\Delta H \left({\bf x}\right) = H ({\bf x}) - \rho({\bf x}). \eqno(15.1.5.21)]$ Starting from a trial solution of $[\delta \rho_{0} \left({\bf x}\right) = {\bf 0}]$ , the initial residual vector is $[\eqalignno{{\bf r}_{0} \left({\bf x}\right) &= (2/V)\rho\left({\bf x}\right){\textstyle\sum\limits_{\bf h}} {\theta \left(\overline{\bf h}\right)} \Delta F\left({\bf h}\right)\exp \left(- 2\pi i{\bf hx}\right) &\cr &\quad- \Delta\rho\left({\bf x}\right) + \Delta H\left({\bf x}\right),&(15.1.5.22)}]$ where $[\eqalignno{\Delta F\left({\bf h}\right) &= F\left({\bf h}\right) - \theta \left({\bf h}\right)G\left({\bf h}\right), &(15.1.5.23)\cr G\left({\bf h}\right) &= (V/N){\textstyle\sum\limits_{\bf y}} \rho^{2} \left({\bf y}\right) \exp \left(2\pi i{\bf hy}\right) &(15.1.5.24)}%(15.1.5.24)]$ and $[\Delta\rho\left({\bf x}\right) = (1/V){\textstyle\sum\limits_{\bf h}} \Delta F\left({\bf h}\right) \exp \left(-2\pi i{\bf hx}\right). \eqno(15.1.5.25)]$ Thus, only three FFTs are required to calculate the initial residual. The residual of Sayre's equation is given in equation (15.1.5.23).

The calculation of $[{\bf q}_{k}]$ in equation (15.1.5.14) is achieved in a similar manner using FFTs, $[\eqalignno{{\bf q}_{k} = {\bf Jp}_{k} &= \left\{{{(1/V){\textstyle\sum\nolimits_{\bf h}} \left[2a\left({\bf h}\right)\theta \left({\bf h}\right) - b\left({\bf h}\right)\right]\exp \left(-2\pi i{\bf hx}\right)} \over {p_{k} \left({\bf x}\right)}}\right\}\cr &= \left[{{Q_{k} \left({\bf x}\right)} \over {p_{k} \left({\bf x}\right)}}\right], &(15.1.5.26)}]$ where the vector is partitioned as shown above, and $[\eqalignno{a\left({\bf h}\right) &= (V/N){\textstyle\sum\limits_{\bf y}} \rho\left({\bf y}\right) p_{k} \left({\bf y}\right)\exp \left(2\pi i{\bf hy}\right), &(15.1.5.27)\cr b\left({\bf h}\right) &= (V/N){\textstyle\sum\limits_{\bf y}}\; p_{k} \left({\bf y}\right)\exp \left(2\pi i{\bf hy}\right). &(15.1.5.28)}%(15.1.5.28)]$

Similarly, vector $[{\bf s}_{k}]$ in equation (15.1.5.16) is obtained from $[\displaylines{{\bf s}_{k} = {\bf J}^{T} {\bf q}_{k} = (2/V)\rho\left({\bf x}\right){\textstyle\sum\limits_{\bf h}} \theta \left(\overline{\bf h}\right) \left[ 2a\left({\bf h}\right)\theta \left({\bf h}\right) - b\left({\bf h}\right)\right]\exp \left(-2\pi i{\bf hx}\right)\hfill\cr \qquad- Q_{k} \left({\bf x}\right) + p_{k} \left({\bf x}\right),\hfill (15.1.5.29)}]$ where $[Q_{k}\left({\bf x}\right)]$ is defined in equation (15.1.5.26).

The remaining calculations in equations (15.1.5.12), (15.1.5.13), (15.1.5.15), (15.1.5.17) and (15.1.5.18) require either the inner product of a pair of vectors or a linear combination of vectors, both of which are very quick to calculate. Each iteration of the conjugate gradient requires four FFTs, as described in equations (15.1.5.26 –15.1.5.29).

15.1.5.2.3. The diagonal approximation

| top | pdf |

The full-matrix solution to equation (15.1.5.4) requires a significant amount of computing, although it can be achieved using FFTs. The diagonal approximation to the normal matrix has been used as an alternative method of solution to the electron-density shift in equation (15.1.5.4) (Main, 1990b). As with the full-matrix calculation, it can be done entirely by FFTs and a linear combination of vectors.

The diagonal element of the normal matrix, $[{\bf J}^{T}{\bf J}]$ , in equation (15.1.5.7) is $[d_{0} \left({\bf x}\right) = (4/N)\rho\left({\bf x}\right)\left[\rho\left({\bf x}\right){\textstyle\sum\limits_{\bf h}} \left| \theta \left({\bf h}\right) \right|^{2} - {\textstyle\sum\limits_{\bf h}} \theta \left({\bf h}\right)\right] + 2. \eqno(15.1.5.30)]$ The right-hand side of equation (15.1.5.7), $[-{\bf J}^{T} \varepsilon \left({\bf x}\right)]$ , is identical to the residual vector, $[r_{0}\left({\bf x}\right)]$ , which can be calculated from equation (15.1.5.22). Therefore, the solution to the electron-density shift, $[\delta\rho\left({\bf x}\right)]$ , can be calculated from $[\delta\rho\left({\bf x}\right) = r_{0} \left({\bf x}\right)/d_{0} \left({\bf x}\right). \eqno(15.1.5.31)]$

Compared with the full-matrix solution, all the calculations involved in between equations (15.1.5.12) and (15.1.5.18) and the subsequent iterations are spared in the diagonal approximation. This makes calculation by the diagonal approximation much faster than by the full-matrix method.

References

Main, P. (1990b). The use of Sayre's equation with constraints for the direct determination of phases. Acta Cryst. A46, 372–377.Google Scholar

International Tables for Crystallography (2006). Vol. F. ch. 15.1, pp. 321-323