International
Tables for Crystallography Volume F Crystallography of biological macromolecules Edited by M. G. Rossmann and E. Arnold © International Union of Crystallography 2006 |
International Tables for Crystallography (2006). Vol. F. ch. 21.1, pp. 498-499
Section 21.1.3. Detecting outliers
aDepartment of Cell and Molecular Biology, Uppsala University, Biomedical Centre, Box 596, SE-751 24 Uppsala, Sweden |
Many statistics, methods and programs were developed in the 1990s to help identify errors in protein models. These methods generally fall into two classes: one in which only coordinates and B factors are considered (such methods often entail comparison of a model to information derived from structural databases) and another in which both the model and the crystallographic data are taken into account. Alternatively, one can distinguish between methods that essentially measure how well the refinement program has succeeded in imposing restraints (e.g. deviations from ideal geometry, conventional R value) and those that assess aspects of the model that are `orthogonal' to the information used in refinement (e.g. free R value, patterns of non-bonded interactions, conformational torsion-angle distributions). An additional distinction can be made between methods that provide overall (global) statistics for a model (such methods are suitable for monitoring the progress of the refinement and rebuilding process) and those that provide information at the level of residues or atoms (such methods are more useful for detecting local problems in a model). It is important to realise that almost all coordinate-based validation methods detect outliers (i.e. atoms or residues with unusual properties): to assess whether an outlier arises from an error in the model or whether it is a genuine, but unusual, feature of the structure, one must inspect the (preferably unbiased) electron-density maps (Jones et al., 1996)!
In this section, some quality indicators will be discussed that have been found to be particularly useful in daily protein crystallographic practice for the purpose of detecting problems in intermediate models. Section 21.1.7 provides a more extensive discussion of many of the quality criteria that are or have been used by macromolecular crystallographers.
From a practical point of view, these are the most useful for the crystallographer who is about to rebuild a model. Examples of useful quality indicators are:
In addition to these criteria, residues with other unusual features should be examined in the electron-density maps for the crystallographer to be able to decide whether they are in error. Such features may pertain to unusual temperature factors, unusual occupancies, unusual bond lengths or angles, unusual torsion angles or deviations from planarity (e.g. for the peptide plane), unusual chirality (e.g. for the Cα atom of every residue type except glycine), unusual differences in the temperature factors of chemically bonded atoms, unusual packing environments (Vriend & Sander, 1993), very short distances between non-bonded atoms (including symmetry mates), large positional shifts during refinement, unusual deviations from noncrystallographic symmetry (Kleywegt & Jones, 1995b; Kleywegt, 1996) etc.
The crystallographic R value used to be the major global quality indicator until it was realised that it can easily be fooled, especially at low resolution (Brändén & Jones, 1990; Jones et al., 1991; Brünger, 1992a; Kleywegt & Jones, 1995b). The free R value, introduced by Brünger (1992a, 1993), has been shown to be much more reliable and harder to manipulate (Kleywegt & Brünger, 1996; Brünger, 1997). It is excellently suited for monitoring the progress of refinement, for detecting major problems with model or data and for helping reduce over-fitting of the data (which occurs if many more parameters are refined in a model than is warranted by the information content of the crystallographic data). Moreover, the free R value can be used to estimate the coordinate error of the final model (Kleywegt et al., 1994; Kleywegt & Brünger, 1996; Brünger, 1997; Cruickshank, 1999).
In addition, the average or r.m.s. values for many of the local statistics, their minimum or maximum values or the percentage of outliers can be quoted and used to obtain an impression of the overall quality of the model and the overall fit of the model to the data.
References
Bhat, T. N. & Cohen, G. H. (1984). OMITMAP: an electron density map suitable for the examination of errors in a macromolecular model. J. Appl. Cryst. 17, 244–248.Google ScholarBrändén, C.-I. & Jones, T. A. (1990). Between objectivity and subjectivity. Nature (London), 343, 687–689.Google Scholar
Brünger, A. T. (1992a). Free R value: a novel statistical quantity for assessing the accuracy of crystal structures. Nature (London), 355, 472–475.Google Scholar
Brünger, A. T. (1993). Assessment of phase accuracy by cross validation: the free R value. Methods and applications. Acta Cryst. D49, 24–36.Google Scholar
Brünger, A. T. (1997). The free R value: a more objective statistic for crystallography. Methods Enzymol. 277, 366–396.Google Scholar
Chapman, M. S. (1995). Restrained real-space macromolecular atomic refinement using a new resolution-dependent electron-density function. Acta Cryst. A51, 69–80.Google Scholar
Cruickshank, D. W. J. (1999). Remarks about protein structure precision. Acta Cryst. D55, 583–601.Google Scholar
Hodel, A., Kim, S.-H. & Brünger, A. T. (1992). Model bias in macromolecular crystal structures. Acta Cryst. A48, 851–858.Google Scholar
Hooft, R. W. W., Sander, C. & Vriend, G. (1996b). Positioning hydrogen atoms by optimizing hydrogen-bond networks in protein structures. Proteins Struct. Funct. Genet. 26, 363–376.Google Scholar
Jones, T. A. & Kjeldgaard, M. (1997). Electron density map interpretation. Methods Enzymol. 277, 173–208.Google Scholar
Jones, T. A., Kleywegt, G. J. & Brünger, A. T. (1996). Storing diffraction data. Nature (London), 381, 18–19.Google Scholar
Jones, T. A., Zou, J.-Y., Cowan, S. W. & Kjeldgaard, M. (1991). Improved methods for building protein models in electron density maps and the location of errors in these models. Acta Cryst. A47, 110–119.Google Scholar
Kleywegt, G. J. (1996). Use of non-crystallographic symmetry in protein structure refinement. Acta Cryst. D52, 842–857.Google Scholar
Kleywegt, G. J., Bergfors, T., Senn, H., Le Motte, P., Gsell, B., Shudo, K. & Jones, T. A. (1994). Crystal structures of cellular retinoic acid binding proteins I and II in complex with all-trans-retinoic acid and a synthetic retinoid. Structure, 2, 1241–1258.Google Scholar
Kleywegt, G. J. & Brünger, A. T. (1996). Checking your imagination: applications of the free R value. Structure, 4, 897–904.Google Scholar
Kleywegt, G. J. & Jones, T. A. (1995b). Where freedom is given, liberties are taken. Structure, 3, 535–540.Google Scholar
Kleywegt, G. J. & Jones, T. A. (1996b). Phi/Psi-chology: Ramachandran revisited. Structure, 4, 1395–1400.Google Scholar
Kleywegt, G. J. & Jones, T. A. (1997). Model-building and refinement practice. Methods Enzymol. 277, 208–230.Google Scholar
Kleywegt, G. J. & Jones, T. A. (1998). Databases in protein crystallography. Acta Cryst. D54, 1119–1131.Google Scholar
Kleywegt, G. J. & Read, R. J. (1997). Not your average density. Structure, 5, 1557–1569.Google Scholar
Ramakrishnan, C. & Ramachandran, G. N. (1965). Stereochemical criteria for polypeptide and protein chain conformations. II. Allowed conformations for a pair of peptide units. Biophys. J. 5, 909–933.Google Scholar
Read, R. J. (1986). Improved Fourier coefficients for maps using phases from partial structures with errors. Acta Cryst. A42, 140–149.Google Scholar
Vaguine, A. A., Richelle, J. & Wodak, S. J. (1999). SFCHECK: a unified set of procedures for evaluating the quality of macromolecular structure-factor data and their agreement with the atomic model. Acta Cryst. D55, 191–205.Google Scholar
Vriend, G. & Sander, C. (1993). Quality control of protein models: directional atomic contact analysis. J. Appl. Cryst. 26, 47–60.Google Scholar
Zou, J. Y. & Mowbray, S. L. (1994). An evaluation of the use of databases in protein structure refinement. Acta Cryst. D50, 237–249.Google Scholar