International
Tables for Crystallography Volume F Crystallography of biological macromolecules Edited by M. G. Rossmann and E. Arnold © International Union of Crystallography 2006 |
International Tables for Crystallography (2006). Vol. F. ch. 21.1, pp. 504-505
Section 21.1.7.4.1. R values
aDepartment of Cell and Molecular Biology, Uppsala University, Biomedical Centre, Box 596, SE-751 24 Uppsala, Sweden |
The traditional statistic used to assess how well a model fits the experimental data is the crystallographic R value, This statistic is closely related to the standard least-squares crystallographic residual and its value can be reduced essentially arbitrarily by increasing the number of parameters used to describe the model (e.g. by refining anisotropic ADPs and occupancies for all atoms) or, conversely, by reducing the number of experimental observations (e.g. through resolution and σ cutoffs) or the number of restraints imposed on the model. Therefore, the conventional R value is only meaningful if the number of experimental observations and restraints greatly exceeds the number of model parameters. In 1992, Brünger introduced the free R value (R free; Brünger, 1992a, 1993, 1997; Kleywegt & Brünger, 1996), whose definition is identical to that of the conventional R value, except that the free R value is calculated for a small subset of reflections that are not used in the refinement of the model. The free R value, therefore, measures how well the model predicts experimental observations that are not used to fit the model (cross-validation). Until a few years ago, a conventional R value below 0.25 was generally considered to be a sign that a model was essentially correct (Brändén & Jones, 1990). While this is probably true at high resolution, it was subsequently shown for several intentionally mistraced models that these can be refined to deceptively low conventional R values (Jones et al., 1991; Kleywegt & Jones, 1995b; Kleywegt & Brünger, 1996). Brünger suggests a threshold value of 0.40 for the free R value, i.e. models with free R values greater than 0.40 should be treated with caution (Brünger, 1997). Tickle and coworkers have developed methods to estimate the expected value of R free in least-squares refinement (Tickle et al., 1998). Since the difference between the conventional and free R value is partly a measure of the extent to which the model over-fits the data (i.e. some aspects of the model improve the conventional but not the free R value and are therefore likely to fit noise rather than signal in the data), this difference R free − R should be small (Kleywegt & Jones, 1995a; Kleywegt & Brünger, 1996). Alternatively, the R free ratio (defined as R free/R; Tickle et al., 1998) should be close to unity. Various practical aspects of the use of the free R value have been discussed by Kleywegt & Brünger (1996) and by Brünger (1997).
Self-validation is an alternative to cross-validation and in the case of crystallographic refinement, the Hamilton test (Hamilton, 1965) is a prime example of this. This method enables one to assess whether a reduction in the R value is statistically significant given the increase in the number of degrees of freedom. Application of this test in the case of macromolecules is compounded by the difficulty of estimating the effect of the combined set of restraints on the (effective) number of degrees of freedom, but some information can nevertheless be gained from such an analysis (Bacchi et al., 1996).
References
Bacchi, A., Lamzin, V. S. & Wilson, K. S. (1996). A self-validation technique for protein structure refinement: the extended Hamilton test. Acta Cryst. D52, 641–646.Google ScholarBrändén, C.-I. & Jones, T. A. (1990). Between objectivity and subjectivity. Nature (London), 343, 687–689.Google Scholar
Brünger, A. T. (1992a). Free R value: a novel statistical quantity for assessing the accuracy of crystal structures. Nature (London), 355, 472–475.Google Scholar
Brünger, A. T. (1993). Assessment of phase accuracy by cross validation: the free R value. Methods and applications. Acta Cryst. D49, 24–36.Google Scholar
Brünger, A. T. (1997). The free R value: a more objective statistic for crystallography. Methods Enzymol. 277, 366–396.Google Scholar
Hamilton, W. C. (1965). Significance tests on the crystallographic R factor. Acta Cryst. 18, 502–510.Google Scholar
Jones, T. A., Zou, J.-Y., Cowan, S. W. & Kjeldgaard, M. (1991). Improved methods for building protein models in electron density maps and the location of errors in these models. Acta Cryst. A47, 110–119.Google Scholar
Kleywegt, G. J. & Brünger, A. T. (1996). Checking your imagination: applications of the free R value. Structure, 4, 897–904.Google Scholar
Kleywegt, G. J. & Jones, T. A. (1995a). Braille for pugilists. In Proceedings of the CCP4 study weekend. Making the most of your model, edited by W. N. Hunter, J. M. Thornton & S. Bailey, pp. 11–24. Warrington: Daresbury Laboratory.Google Scholar
Kleywegt, G. J. & Jones, T. A. (1995b). Where freedom is given, liberties are taken. Structure, 3, 535–540.Google Scholar
Tickle, I. J., Laskowski, R. A. & Moss, D. S. (1998). Rfree and the Rfree ratio. I. Derivation of expected values of cross-validation residuals used in macromolecular least-squares refinement. Acta Cryst. D54, 547–557.Google Scholar