Two examples of full-matrix inversion

Cruickshank, D. W. J.

doi:10.1107/97809553602060000697

International
Tables for
Crystallography
Volume F
Crystallography of biological macromolecules
Edited by M. G. Rossmann and E. Arnold

pdf | chapter contents | chapter index | related articles

International Tables for Crystallography (2006). Vol. F. ch. 18.5, pp. 406-409 | 1 | 2 |

Section 18.5.4. Two examples of full-matrix inversion

D. W. J. Cruickshank^a ^*^‡

^a Chemistry Department, UMIST, Manchester M60 1QD, England
Correspondence e-mail: dwj_cruickshank@email.msn.com

18.5.4. Two examples of full-matrix inversion

| top | pdf |

18.5.4.1. Unrestrained and restrained inversions for concanavalin A

| top | pdf |

G. M. Sheldrick extended his SHELXL96 program (Sheldrick & Schneider, 1997) to provide extra information about protein precision through the inversion of least-squares full matrices. His programs have been used by Deacon et al. (1997) for the high-resolution refinement of native concanavalin A with 237 residues, using data at 110 K to 0.94 Å refined anisotropically. After the convergence and completion of full-matrix restrained refinement for the structure, the unrestrained full matrix (coordinates only) was computed and then inverted in a massive calculation. This led to s.u's $[\sigma (x)]$ , $[\sigma (y)]$ , $[\sigma (z)]$ and $[\sigma (r)]$ for all atoms, and to $[\sigma (l)]$ and $[\sigma (\theta)]$ for all bond lengths and angles. $[\sigma (r)]$ is defined as $[[\sigma^{2}(x) + \sigma^{2}(y) + \sigma^{2}(z)]^{1/2}]$ . For concanavalin A the restrained full matrix was also inverted, thus allowing the comparison of restrained and unrestrained s.u.'s.

The results for concanavalin A from the inversion of the coordinate matrices of order 6402 (= 2134 × 3) are plotted in Figs. 18.5.4.1 and 18.5.4.2. Fig. 18.5.4.1 shows $[\sigma (r)]$ versus $[B_{\rm eq}]$ for the fully occupied atoms of the protein (a few atoms with B > 60 Å² are off-scale). The points are colour-coded black for carbon, blue for nitrogen and red for oxygen. Fig. 18.5.4.1(a) shows the restrained results, and Fig. 18.5.4.1(b) shows the unrestrained diffraction-data-only results. Superposed on both sets of data points are least-squares quadratic fits determined with weights $[1/B^{2}]$ . At high B, the unrestrained $[\sigma_{\rm diff}(r)]$ can be at least double the restrained $[\sigma_{\rm res}(r)]$ , e.g., for carbon at B = 50 Å², the unrestrained $[\sigma_{\rm diff}(r)]$ is about 0.25 Å, whereas the restrained $[\sigma_{\rm res}(r)]$ is about 0.11 Å. For B < 10 Å², both $[\sigma (r)]$ 's fall below 0.02 Å and are around 0.01 Å at B = 6 Å².

Figure 18.5.4.1| top | pdf |

Plots of $[\sigma (r)]$ versus $[B_{\rm eq}]$ for concanavalin A with 0.94 Å data, (a) restrained full-matrix $[\sigma_{\rm res}(r)]$ , (b) unrestrained full-matrix $[\sigma_{\rm diff}(r)]$ . Carbon black, nitrogen blue, oxygen red.

Figure 18.5.4.2| top | pdf |

Plots of $[\sigma (l)]$ versus average $[B_{\rm eq}]$ for concanavalin A with 0.94 Å data, (a) restrained full-matrix $[\sigma_{\rm res}(l)]$ , (b) unrestrained full-matrix $[\sigma_{\rm diff}(l)]$ . C—C black, C—N blue, C—O red.

For B < 10 Å², the better precision of oxygen as compared with nitrogen, and of nitrogen as compared with carbon, can be clearly seen. At the lowest B, the unrestrained $[\sigma_{\rm diff}(r)]$ in Fig. 18.5.4.1(b) are almost as small as the restrained $[\sigma_{\rm res}(r)]$ in Fig. 18.5.4.1(a). [The quadratic fits of the restrained results in Fig. 18.5.4.1(a) are evidently slightly imperfect in making $[\sigma_{\rm res}(r)]$ tend almost to 0 as B tends to 0.]

Fig. 18.5.4.2 shows $[\sigma (l)]$ versus $[B_{\rm eq}]$ for the bond lengths in the protein. The points are colour-coded black for C—C, blue for C—N and red for C—O. The restrained and unrestrained distributions are very different for high B. The restrained distribution in Fig. 18.5.4.2(a) tends to about 0.02 Å, which is the standard uncertainty of the applied restraint for 1–2 bond lengths, whereas the unrestrained distribution in Fig. 18.5.4.2(b) goes off the scale of the diagram. But for B < 10 Å², both distributions fall to around 0.01 Å.

The differences between the restrained and unrestrained $[\sigma (r)]$ and $[\sigma (l)]$ can be understood through the two-atom model for restrained refinement described in Section 18.5.3. For that model, the equation $[1 / \sigma_{\rm res}^{2} (l) = 1 / \sigma_{\rm diff}^{2} (l) + 1 / \sigma_{\rm geom}^{2} (l) \eqno(18.5.3.16)]$ relates the bond-length s.u. in the restrained refinement, $[\sigma_{\rm res}(l)]$ , to the $[\sigma_{\rm diff}(l)]$ of the unrestrained refinement and the s.u. $[\sigma_{\rm geom}(l)]$ assigned to the length in the stereochemical dictionary. In the refinements, $[\sigma_{\rm geom}(l)]$ was 0.02 Å for all bond lengths. When this is combined in (18.5.3.16) with the unrestrained $[\sigma_{\rm diff}(l)]$ of any bond, the predicted restrained $[\sigma_{\rm res}(l)]$ is close to that found in the restrained full matrix.

It can be seen from Fig. 18.5.4.2(b) that many bond lengths with average B < 10 Å² have $[\sigma_{\rm diff}(l)\lt 0.014]$ Å. For these bonds the diffraction data have greater weight than the stereochemical dictionary. Some bonds have $[\sigma_{\rm diff}(l)]$ as low as 0.0080 Å, with $[\sigma_{\rm res}(l)]$ around 0.0074 Å. This situation is one consequence of the availability of diffraction data to the high resolution of 0.94 Å. For large $[\sigma_{\rm diff}(l)]$ (i.e., high B), equation (18.5.3.16) predicts that $[\sigma_{\rm res}(l) = \sigma_{\rm geom}(l) = 0.02]$ Å, as is found in Fig. 18.5.4.2(a).

In an isotropic approximation, $[\sigma (r) = 3^{1/2}\sigma (x)]$ . Equation (18.5.3.12) of the two-atom model can be recast to give $[\sigma_{\rm res}^{2} (r) = \sigma_{\rm diff}^{2} (r) \left\{\left[\sigma_{\rm diff}^{2} (r) + 3(0.02)^{2}\right]\bigg/\left[2\sigma_{\rm diff}^{2} (r) + 3(0.02)^{2}\right]\right\}. \eqno(18.5.4.1)]$ For low B, say $[B \leq 15\ \hbox{\AA}^{2}]$ in concanavalin, (18.5.4.1) gives quite good predictions of $[\sigma_{\rm res}(r)]$ from $[\sigma_{\rm diff}(r)]$ . For instance, for a carbon atom with B = 15 Å², the quadratic curve for carbon in Fig. 18.5.4.1(b) shows $[\sigma_{\rm diff}(r) = 0.034]$ Å, and Fig. 18.5.4.1(a) shows $[\sigma_{\rm res}(r) = 0.029]$ Å. While if $[\sigma_{\rm diff}(r) = 0.034]$ Å is used with (18.5.4.1), the resulting prediction for $[\sigma_{\rm res}(r)]$ is 0.028 Å.

However, for high B, say B = 50 Å², the quadratic curve for carbon in Fig. 18.5.4.1(b) shows $[\sigma_{\rm diff}(r) = 0.25]$ Å, and Fig. 18.5.4.1(a) shows $[\sigma_{\rm res}(r) = 0.11]$ Å, whereas (18.5.4.1) leads to the poor estimate $[\sigma_{\rm res}(r) = 0.18]$ Å.

Thus at high B, equation (18.5.4.1) from the two-atom model does not give a good description of the relationship between the restrained and unrestrained $[\sigma (r)]$ . The reason is obvious. Most atoms are linked by 1–2 bond restraints to two or three other atoms. Even a carbonyl oxygen atom linked to its carbon atom by a 0.02 Å restraint is also subject to 0.04 Å 1–3 restraints to chain $[\hbox{C}_{\alpha}]$ and N atoms. Consequently, for a high-B atom, when the restraints are applied it is coupled to several other atoms in a group, and its $[\sigma_{\rm res}(r)]$ is lower, compared with the diffraction-data-only $[\sigma_{\rm diff}(r)]$ , by a greater amount than would be expected from the two-atom model.

18.5.4.2. Unrestrained inversion for an immunoglobulin

| top | pdf |

Sheldrick has provided the results of the unrestrained lower-resolution refinement of a single-chain immunoglobulin mutant (T39K) with 218 amino-acid residues, with data to 1.70 Å refined isotropically (Usón et al., 1999). Fig. 18.5.4.3 shows $[\sigma_{\rm diff}(r)]$ versus $[B_{\rm eq}]$ for the fully occupied protein atoms. Superposed on the data points are least-squares quadratic fits. In a first very rough approximation for $[\sigma_{\rm diff}(x_{i})]$ suggested later by equation (18.5.6.3), the dependence on atom type is controlled by $[1/Z_{i}]$ , the reciprocal of the atomic number. Sheldrick found that a $[1/Z_{i}]$ dependence produced too little difference between C, N and O. The proportionalities between the quadratics for $[\sigma (r)]$ in Figs. 18.5.4.1 and 18.5.4.3 are based on the reciprocals of the scattering factors at $[\sin \theta /\lambda = 0.3\ \hbox{\AA}^{-1}]$ , symbolized by $[Z_{i}^{\# }]$ . For C, N and O, these are 2.494, 3.219 and 4.089, respectively. For potential use in later work, the least-squares fits to the $[\sigma (r_{i})Z_{i}^{\# }]$ in Å are recorded here as $[\eqalignno{ 0.11892 &+ 0.00891B + 0.0001462B^{2}, &(18.5.4.2a)\cr 0.01826 &+ 0.001043B + 0.0002230B^{2} \hbox{ and } &(18.5.4.2b)\cr 0.00115 &+ 0.004414B + 0.0000214B^{2} &(18.5.4.2c)\cr}]$ for the immunoglobulin (unrestrained), concanavalin A (unrestrained) and concanavalin A (restrained), respectively.

Figure 18.5.4.3| top | pdf |

Plot of $[\sigma_{\rm diff}(r)]$ versus $[B_{\rm eq}]$ from an unrestrained full matrix for immunoglobulin mutant (T39K) with 1.70 Å data. Carbon black, nitrogen blue, oxygen red.

As might be expected from the lower resolution, the lowest $[\sigma_{\rm diff}(r)]$ 's in the immunoglobulin are about six times the lowest $[\sigma_{\rm diff}(r)]$ 's in concanavalin. But at B = 50 Å², the immunoglobulin curve for carbon gives $[\sigma_{\rm diff}(r) = 0.37]$ Å, which is only 50% larger than the concanavalin value of 0.25 Å.

Fig. 18.5.4.4 shows $[\sigma_{\rm diff}(l)]$ versus $[B_{\rm eq}]$ for the immunoglobulin. Note that the lowest immunoglobulin unrestrained $[\sigma_{\rm diff}(l)]$ is about 0.06 Å, which is three times the 0.02 Å $[\sigma_{\rm geom}(l)]$ bond restraint.

Figure 18.5.4.4| top | pdf |

Plot of $[\sigma_{\rm diff}(l)]$ versus average $[B_{\rm eq}]$ from an unrestrained full matrix for immunoglobulin mutant (T39K) with 1.70 Å data. C—O black, C—N blue, C—O red.

18.5.4.3. Comments on restrained refinement

| top | pdf |

Geometric restraint dictionaries typically use bond-length weights based on $[\sigma_{\rm geom}(l)]$ of around 0.02 or 0.03 Å. Tables 18.5.7.1 –18.5.7.3 show that even 1.5 Å studies have diffraction-only errors $[\sigma_{\rm diff}(x, B_{\rm avg})]$ of 0.08 Å and upwards. Only for resolutions of 1.0 Å or so are the diffraction-only errors comparable with the dictionary weights. Of course, the dictionary offers no values for many of the configurational parameters of the protein structure, including the centroid and molecular orientation.

18.5.4.4. Full-matrix estimates of precision

| top | pdf |

The opening contention of this chapter in Section 18.5.1.1 is that the variances and covariances of the structural parameters of proteins can be found from the inverse of the least-squares normal matrix. But there is a caveat, chiefly that explicit account would not be taken of disorder of the solvent or of parts of the protein. Corrections by Babinet's principle of complementarity or by mask bulk solvent models are only first-order approximations. The consequences of such disorder problems, which make the variation of calculated structure factors nonlinear over the range of interest, may in future be better handled by maximum-likelihood methods (e.g. Read, 1990; Bricogne, 1993; Bricogne & Irwin, 1996; Murshudov et al., 1997). Pannu & Read (1996) have shown how the maximum-likelihood method can be cast computationally into a form akin to least-squares calculations. Full-matrix precision estimates along the lines of the present chapter are probably somewhat low.

It should also be noted that full-matrix estimates of coordinate precision are most reliably derived from matrices involving both coordinates and atomic displacement parameters. This is particularly important for lower-resolution analyses, in which atomic images overlap. The work on the high-resolution analysis of concanavalin A described in Section 18.5.4.1 was based on the very large coordinate matrix, of order 6402. The omission, because of computer limitations, of the anisotropic displacement parameters from the full matrix will have caused the coordinate s.u.'s of atoms with high $[B_{\rm eq}]$ to be underestimated.

Much information about the quality of a molecular model can be obtained from the eigenvalues and eigenvectors of the normal matrix (Cowtan & Ten Eyck, 2000).

References

Bricogne, G. (1993). Direct phase determination by entropy maximization and likelihood ranking: status report and perspectives. Acta Cryst. D49, 37–60.Google Scholar

Bricogne, G. & Irwin, J. (1996). Maximum-likelihood structure refinement: theory and implementation within BUSTER + TNT. In Proceedings of the CCP4 study weekend. Macromolecular refinement, edited by E. Dodson, M. Moore, A. Ralph & S. Bailey, pp. 85–92. Warrington: Daresbury Laboratory.Google Scholar

Cowtan, K. & Ten Eyck, L. F. (2000). Eigensystem analysis of the refinement of a small metalloprotein. Acta Cryst. D56, 842–856.Google Scholar

Deacon, A., Gleichmann, T., Kalb (Gilboa), A. J., Price, H., Raftery, J., Bradbrook, G., Yariv, J. & Helliwell, J. R. (1997). The structure of concanavalin A and its bound solvent determined with small-molecule accuracy at 0.94 Å resolution. J. Chem. Soc. Faraday Trans. 93, 4305–4312.Google Scholar

Murshudov, G. N., Vagin, A. A. & Dodson, E. J. (1997). Refinement of macromolecular structures by the maximum-likelihood method. Acta Cryst. D53, 240–255.Google Scholar

Pannu, N. S. & Read, R. J. (1996). Improved structure refinement through maximum likelihood. Acta Cryst. A52, 659–668.Google Scholar

Read, R. J. (1990). Structure-factor probabilities for related structures. Acta Cryst. A46, 900–912.Google Scholar

Sheldrick, G. M. & Schneider, T. R. (1997). SHELXL: high resolution refinement. Methods Enzymol. 277, 319–343.Google Scholar

Usón, I., Pohl, E., Schneider, T. R., Dauter, Z., Schmidt, A., Fritz, H.-J. & Sheldrick, G. M. (1999). 1.7 Å structure of the stabilized REI_V mutant T39K. Application of local NCS restraints. Acta Cryst. D55, 1158–1167.Google Scholar

International Tables for Crystallography (2006). Vol. F. ch. 18.5, pp. 406-409