International
Tables for
Crystallography
Volume F
Crystallography of biological macromolecules
Edited by M. G. Rossmann and E. Arnold

International Tables for Crystallography (2006). Vol. F. ch. 21.1, pp. 498-499   | 1 | 2 |

Section 21.1.3. Detecting outliers

G. J. Kleywegta*

aDepartment of Cell and Molecular Biology, Uppsala University, Biomedical Centre, Box 596, SE-751 24 Uppsala, Sweden
Correspondence e-mail: gerard@xray.bmc.uu.se

21.1.3. Detecting outliers

| top | pdf |

21.1.3.1. Classes of quality indicators

| top | pdf |

Many statistics, methods and programs were developed in the 1990s to help identify errors in protein models. These methods generally fall into two classes: one in which only coordinates and B factors are considered (such methods often entail comparison of a model to information derived from structural databases) and another in which both the model and the crystallographic data are taken into account. Alternatively, one can distinguish between methods that essentially measure how well the refinement program has succeeded in imposing restraints (e.g. deviations from ideal geometry, conventional R value) and those that assess aspects of the model that are `orthogonal' to the information used in refinement (e.g. free R value, patterns of non-bonded interactions, conformational torsion-angle distributions). An additional distinction can be made between methods that provide overall (global) statistics for a model (such methods are suitable for monitoring the progress of the refinement and rebuilding process) and those that provide information at the level of residues or atoms (such methods are more useful for detecting local problems in a model). It is important to realise that almost all coordinate-based validation methods detect outliers (i.e. atoms or residues with unusual properties): to assess whether an outlier arises from an error in the model or whether it is a genuine, but unusual, feature of the structure, one must inspect the (preferably unbiased) electron-density maps (Jones et al., 1996[link])!

In this section, some quality indicators will be discussed that have been found to be particularly useful in daily protein crystallographic practice for the purpose of detecting problems in intermediate models. Section 21.1.7[link] provides a more extensive discussion of many of the quality criteria that are or have been used by macromolecular crystallographers.

21.1.3.2. Local statistics

| top | pdf |

From a practical point of view, these are the most useful for the crystallographer who is about to rebuild a model. Examples of useful quality indicators are:

  • (1) The real-space fit (Jones et al., 1991[link]; Chapman, 1995[link]; Jones & Kjeldgaard, 1997[link]; Vaguine et al., 1999[link]), expressed as an R value or as a correlation coefficient between `observed' and calculated density. This property can be calculated for any subset of atoms, e.g. for an entire residue, for main-chain atoms or for side-chain atoms. It is best to use a map that is biased by the model as little as possible [e.g., a σA-weighted map (Read, 1986[link]), an NCS-averaged map (Kleywegt & Read, 1997[link]) or an omit map (Bhat & Cohen, 1984[link]; Hodel et al., 1992[link])]. In practice, the real-space fit is strongly correlated with the atomic temperature factors, even though these are not used in the calculations.

  • (2) The Ramachandran plot (Ramakrishnan & Ramachandran, 1965[link]; Kleywegt & Jones, 1996b[link]). Residues with unusual main-chain φ, ψ torsion-angle combinations that do not have unequivocally clear electron density are almost always in error. However, one should keep in mind that the error may have its origin in (one of) the neighbouring residues. For instance, if the peptide O atom of a residue is pointing in the wrong direction, the φ value for the next residue may be off by 150–180° (Kleywegt, 1996[link]; Kleywegt & Jones, 1998[link]).

  • (3) The pep-flip value (Jones et al., 1991[link]; Kleywegt & Jones, 1998[link]). This statistic measures the r.m.s. distance between the peptide O atom of a residue and its counterparts found in a database of well refined high-resolution structures that occur in parts of those structures with a similar local Cα backbone conformation. If the pep-flip value is large (e.g. >2.5 Å), the residue is termed an outlier, but whether it is an error can only be determined by inspecting the local density.

  • (4) The rotamer side-chain fit value (Jones et al., 1991[link]; Kleywegt & Jones, 1998[link]). This statistic measures the r.m.s. distance between the side-chain atoms of a residue and those in the most similar rotamer conformation for that residue type. A value greater than ∼1.0–1.5 Å signals an outlier. In many cases (particularly, but not exclusively, at low resolution), a non-rotamer side chain can easily be replaced by a rotamer conformation, perhaps in conjunction with a slight rigid-body movement of the entire residue or with some adjustment of the side-chain torsion angles (Zou & Mowbray, 1994[link]; Kleywegt & Jones, 1997[link]).

  • (5) Hydrogen-bonding analysis. The correct orientation of histidine, asparagine and glutamine side chains cannot usually be inferred from electron density alone. Inexperienced crystallographers can benefit from suggestions based on the analysis of hydrogen-bonding networks (Hooft et al., 1996b[link]), although every case should be examined critically (e.g. the program does not know about solvent molecules that have not yet been added to the model or that cannot be placed because of the limitations of the data; in addition, sometimes an amino group may be interacting with an aromatic side chain).

In addition to these criteria, residues with other unusual features should be examined in the electron-density maps for the crystallographer to be able to decide whether they are in error. Such features may pertain to unusual temperature factors, unusual occupancies, unusual bond lengths or angles, unusual torsion angles or deviations from planarity (e.g. for the peptide plane), unusual chirality (e.g. for the Cα atom of every residue type except glycine), unusual differences in the temperature factors of chemically bonded atoms, unusual packing environments (Vriend & Sander, 1993[link]), very short distances between non-bonded atoms (including symmetry mates), large positional shifts during refinement, unusual deviations from noncrystallographic symmetry (Kleywegt & Jones, 1995b[link]; Kleywegt, 1996[link]) etc.

21.1.3.3. Global statistics

| top | pdf |

The crystallographic R value used to be the major global quality indicator until it was realised that it can easily be fooled, especially at low resolution (Brändén & Jones, 1990[link]; Jones et al., 1991[link]; Brünger, 1992a[link]; Kleywegt & Jones, 1995b[link]). The free R value, introduced by Brünger (1992a[link], 1993[link]), has been shown to be much more reliable and harder to manipulate (Kleywegt & Brünger, 1996[link]; Brünger, 1997[link]). It is excellently suited for monitoring the progress of refinement, for detecting major problems with model or data and for helping reduce over-fitting of the data (which occurs if many more parameters are refined in a model than is warranted by the information content of the crystallographic data). Moreover, the free R value can be used to estimate the coordinate error of the final model (Kleywegt et al., 1994[link]; Kleywegt & Brünger, 1996[link]; Brünger, 1997[link]; Cruickshank, 1999[link]).

In addition, the average or r.m.s. values for many of the local statistics, their minimum or maximum values or the percentage of outliers can be quoted and used to obtain an impression of the overall quality of the model and the overall fit of the model to the data.

References

First citation Bhat, T. N. & Cohen, G. H. (1984). OMITMAP: an electron density map suitable for the examination of errors in a macromolecular model. J. Appl. Cryst. 17, 244–248.Google Scholar
First citation Brändén, C.-I. & Jones, T. A. (1990). Between objectivity and subjectivity. Nature (London), 343, 687–689.Google Scholar
First citation Brünger, A. T. (1992a). Free R value: a novel statistical quantity for assessing the accuracy of crystal structures. Nature (London), 355, 472–475.Google Scholar
First citation Brünger, A. T. (1993). Assessment of phase accuracy by cross validation: the free R value. Methods and applications. Acta Cryst. D49, 24–36.Google Scholar
First citation Brünger, A. T. (1997). The free R value: a more objective statistic for crystallography. Methods Enzymol. 277, 366–396.Google Scholar
First citation Chapman, M. S. (1995). Restrained real-space macromolecular atomic refinement using a new resolution-dependent electron-density function. Acta Cryst. A51, 69–80.Google Scholar
First citation Cruickshank, D. W. J. (1999). Remarks about protein structure precision. Acta Cryst. D55, 583–601.Google Scholar
First citation Hodel, A., Kim, S.-H. & Brünger, A. T. (1992). Model bias in macromolecular crystal structures. Acta Cryst. A48, 851–858.Google Scholar
First citation Hooft, R. W. W., Sander, C. & Vriend, G. (1996b). Positioning hydrogen atoms by optimizing hydrogen-bond networks in protein structures. Proteins Struct. Funct. Genet. 26, 363–376.Google Scholar
First citation Jones, T. A. & Kjeldgaard, M. (1997). Electron density map interpretation. Methods Enzymol. 277, 173–208.Google Scholar
First citation Jones, T. A., Kleywegt, G. J. & Brünger, A. T. (1996). Storing diffraction data. Nature (London), 381, 18–19.Google Scholar
First citation Jones, T. A., Zou, J.-Y., Cowan, S. W. & Kjeldgaard, M. (1991). Improved methods for building protein models in electron density maps and the location of errors in these models. Acta Cryst. A47, 110–119.Google Scholar
First citation Kleywegt, G. J. (1996). Use of non-crystallographic symmetry in protein structure refinement. Acta Cryst. D52, 842–857.Google Scholar
First citation Kleywegt, G. J., Bergfors, T., Senn, H., Le Motte, P., Gsell, B., Shudo, K. & Jones, T. A. (1994). Crystal structures of cellular retinoic acid binding proteins I and II in complex with all-trans-retinoic acid and a synthetic retinoid. Structure, 2, 1241–1258.Google Scholar
First citation Kleywegt, G. J. & Brünger, A. T. (1996). Checking your imagination: applications of the free R value. Structure, 4, 897–904.Google Scholar
First citation Kleywegt, G. J. & Jones, T. A. (1995b). Where freedom is given, liberties are taken. Structure, 3, 535–540.Google Scholar
First citation Kleywegt, G. J. & Jones, T. A. (1996b). Phi/Psi-chology: Ramachandran revisited. Structure, 4, 1395–1400.Google Scholar
First citation Kleywegt, G. J. & Jones, T. A. (1997). Model-building and refinement practice. Methods Enzymol. 277, 208–230.Google Scholar
First citation Kleywegt, G. J. & Jones, T. A. (1998). Databases in protein crystallography. Acta Cryst. D54, 1119–1131.Google Scholar
First citation Kleywegt, G. J. & Read, R. J. (1997). Not your average density. Structure, 5, 1557–1569.Google Scholar
First citation Ramakrishnan, C. & Ramachandran, G. N. (1965). Stereochemical criteria for polypeptide and protein chain conformations. II. Allowed conformations for a pair of peptide units. Biophys. J. 5, 909–933.Google Scholar
First citation Read, R. J. (1986). Improved Fourier coefficients for maps using phases from partial structures with errors. Acta Cryst. A42, 140–149.Google Scholar
First citation Vaguine, A. A., Richelle, J. & Wodak, S. J. (1999). SFCHECK: a unified set of procedures for evaluating the quality of macromolecular structure-factor data and their agreement with the atomic model. Acta Cryst. D55, 191–205.Google Scholar
First citation Vriend, G. & Sander, C. (1993). Quality control of protein models: directional atomic contact analysis. J. Appl. Cryst. 26, 47–60.Google Scholar
First citation Zou, J. Y. & Mowbray, S. L. (1994). An evaluation of the use of databases in protein structure refinement. Acta Cryst. D50, 237–249.Google Scholar








































to end of page
to top of page