International
Tables for
Crystallography
Volume F
Crystallography of biological macromolecules
Edited by M. G. Rossmann and E. Arnold

International Tables for Crystallography (2006). Vol. F. ch. 18.4, pp. 397-398   | 1 | 2 |

Section 18.4.4.4. Validation of extra parameters during the refinement process

Z. Dauter,a* G. N. Murshudovb and K. S. Wilsonc

a National Cancer Institute, Brookhaven National Laboratory, Building 725A-X9, Upton, NY 11973, USA,bStructural Biology Laboratory, Department of Chemistry, University of York, York YO10 5DD, England, and CLRC, Daresbury Laboratory, Daresbury, Warrington, WA4 4AD, England, and cStructural Biology Laboratory, Department of Chemistry, University of York, York YO10 5DD, England
Correspondence e-mail:  dauter@bnl.gov

18.4.4.4. Validation of extra parameters during the refinement process

| top | pdf |

The introduction of additional parameters into the model always results in a reduction in the least-squares or maximum-likelihood residual – in crystallographic terms, the R factor. However, the statistical significance of this reduction is not always clear, since this simultaneously reduces the observation-to-parameter ratio. It is therefore important to validate the significance of the introduction of further parameters into the model on a statistical basis. Early attempts to derive such an objective tool were made by Hamilton (1965)[link]. Unfortunately, they proved to be cumbersome in practice for large structures and did not provide the required objectivity.

Direct application of the Hamilton test is especially problematical for macromolecules because of the use of restraints. Attempts have been made to overcome these problems, using a direct extension of the Hamilton test itself (Bacchi et al., 1996[link]) or with a combination of self and cross validation (Tickle et al., 1998[link]).

Brünger (1992a)[link] introduced the concept of statistical cross validation to evaluate the significance of introducing extra features into the atomic model. For this, a small and randomly distributed subset of the experimental observations is excluded from the refinement procedure, and the residual against this subset of reflections is termed [R_{\rm free}]. It is generally sufficient to include about 1000 reflections in the [R_{\rm free}] subset; further increase in this number provides little, if any, statistical advantage but diminishes the power of the minimization procedure. For atomic resolution structures, cross validation is important in establishing whether the introduction of an additional type of feature to the model (with its associated increase in parameters) is justified. There are two limitations to this. Firstly, if [R_{\rm free}] shows zero or minimal decrease compared to that in the R factor, the significance remains unclear. Secondly, the introduction of individual features, for example the partial occupancy of five water molecules, can provide only a very small change in [R_{\rm free}], which will be impossible to substantiate. To recapitulate, at atomic resolution the prime use of cross validation is in establishing protocols with regard to extended sets of parameter types. The sets thus defined will depend on the quality of the data.

In the final analysis, validation of individual features depends on the electron density, and Fourier maps must be judiciously inspected. Nevertheless, this remains a somewhat subjective approach and is in practice intractable for extensive sets of parameters, such as the occupancies and ADPs of all solvent sites. For the latter, automated procedures, which are presently being developed, are an absolute necessity, but they may not be optimal in the final stages of structure analysis, and visual inspection of the model and density is often needed.

The problems of limited data and reparameterization of the model remain. At high resolution, reparameterization means having the same number of atoms, but changing the number of parameters to increase their statistical significance, for example switching from an anisotropic to an isotropic atomic model or vice versa. In contrast, when reparameterization is applied at low resolution, this usually involves reduction in the number of atoms, but this is not an ideal procedure, as real chemical entities of the model are sacrificed. Reducing the number of atoms will inevitably result in disagreement between the experiment and model, which in turn will affect the precision of other parameters. It would be more appropriate to reduce the number of parameters without sacrificing the number of atoms, for example by describing the model in torsion-angle space. Water poses a particular problem, as at low as well as at high resolution not all water molecules cannot be described as discrete atoms. Algorithms are needed to describe them as a continuous model with only a few parameters. In the simplest model, the solvent can be described as a constant electron density.

References

First citation Bacchi, A., Lamzin, V. S. & Wilson, K. S. (1996). A self-validation technique for protein structure refinement: the extended Hamilton test. Acta Cryst. D52, 641–646.Google Scholar
First citation Brünger, A. T. (1992a). Free R value: a novel statistical quantity for assessing the accuracy of crystal structures. Nature (London), 355, 472–475.Google Scholar
First citation Hamilton, W. C. (1965). Significance tests on the crystallographic R factor. Acta Cryst. 18, 502–510.Google Scholar
First citation Tickle, I. J., Laskowski, R. A. & Moss, D. S. (1998). Rfree and the Rfree ratio. Part I: Derivation of expected values of cross-validation residuals used in macromolecular least-squares refinement. Acta Cryst. D54, 547–557.Google Scholar








































to end of page
to top of page