InternationalCrystallography of biological macromoleculesTables for Crystallography Volume F Edited by M. G. Rossmann and E. Arnold © International Union of Crystallography 2006 |
International Tables for Crystallography (2006). Vol. F, ch. 21.1, pp. 498-499
## Section 21.1.3. Detecting outliers |

Many statistics, methods and programs were developed in the 1990s to help identify errors in protein models. These methods generally fall into two classes: one in which only coordinates and *B* factors are considered (such methods often entail comparison of a model to information derived from structural databases) and another in which both the model and the crystallographic data are taken into account. Alternatively, one can distinguish between methods that essentially measure how well the refinement program has succeeded in imposing restraints (*e.g.* deviations from ideal geometry, conventional *R* value) and those that assess aspects of the model that are `orthogonal' to the information used in refinement (*e.g.* free *R* value, patterns of non-bonded interactions, conformational torsion-angle distributions). An additional distinction can be made between methods that provide overall (global) statistics for a model (such methods are suitable for monitoring the progress of the refinement and rebuilding process) and those that provide information at the level of residues or atoms (such methods are more useful for detecting local problems in a model). It is important to realise that almost all coordinate-based validation methods detect *outliers* (*i.e.* atoms or residues with unusual properties): to assess whether an outlier arises from an *error* in the model or whether it is a genuine, but unusual, *feature* of the structure, one must inspect the (preferably unbiased) electron-density maps (Jones *et al.*, 1996)!

In this section, some quality indicators will be discussed that have been found to be particularly useful in daily protein crystallographic practice for the purpose of detecting problems in intermediate models. Section 21.1.7 provides a more extensive discussion of many of the quality criteria that are or have been used by macromolecular crystallographers.

From a practical point of view, these are the most useful for the crystallographer who is about to rebuild a model. Examples of useful quality indicators are:

In addition to these criteria, residues with other unusual features should be examined in the electron-density maps for the crystallographer to be able to decide whether they are in error. Such features may pertain to unusual temperature factors, unusual occupancies, unusual bond lengths or angles, unusual torsion angles or deviations from planarity (*e.g.* for the peptide plane), unusual chirality (*e.g.* for the C^{α} atom of every residue type except glycine), unusual differences in the temperature factors of chemically bonded atoms, unusual packing environments (Vriend & Sander, 1993), very short distances between non-bonded atoms (including symmetry mates), large positional shifts during refinement, unusual deviations from noncrystallographic symmetry (Kleywegt & Jones, 1995*b*; Kleywegt, 1996) *etc.*

The crystallographic *R* value used to be the major global quality indicator until it was realised that it can easily be fooled, especially at low resolution (Brändén & Jones, 1990; Jones *et al.*, 1991; Brünger, 1992*a*; Kleywegt & Jones, 1995*b*). The free *R* value, introduced by Brünger (1992*a*, 1993), has been shown to be much more reliable and harder to manipulate (Kleywegt & Brünger, 1996; Brünger, 1997). It is excellently suited for monitoring the progress of refinement, for detecting major problems with model or data and for helping reduce over-fitting of the data (which occurs if many more parameters are refined in a model than is warranted by the information content of the crystallographic data). Moreover, the free *R* value can be used to estimate the coordinate error of the final model (Kleywegt *et al.*, 1994; Kleywegt & Brünger, 1996; Brünger, 1997; Cruickshank, 1999).

In addition, the average or r.m.s. values for many of the local statistics, their minimum or maximum values or the percentage of outliers can be quoted and used to obtain an impression of the overall quality of the model and the overall fit of the model to the data.

### References

Bhat, T. N. & Cohen, G. H. (1984).*OMITMAP: an electron density map suitable for the examination of errors in a macromolecular model.*

*J. Appl. Cryst.*

**17**, 244–248.Google Scholar

Brändén, C.-I. & Jones, T. A. (1990).

*Between objectivity and subjectivity.*

*Nature (London)*,

**343**, 687–689.Google Scholar

Brünger, A. T. (1992

*a*).

*Free R value: a novel statistical quantity for assessing the accuracy of crystal structures.*

*Nature (London)*,

**355**, 472–475.Google Scholar

Brünger, A. T. (1993).

*Assessment of phase accuracy by cross validation: the free R value. Methods and applications.*

*Acta Cryst.*D

**49**, 24–36.Google Scholar

Brünger, A. T. (1997).

*The free R value: a more objective statistic for crystallography.*

*Methods Enzymol.*

**277**, 366–396.Google Scholar

Chapman, M. S. (1995).

*Restrained real-space macromolecular atomic refinement using a new resolution-dependent electron-density function.*

*Acta Cryst.*A

**51**, 69–80.Google Scholar

Cruickshank, D. W. J. (1999).

*Remarks about protein structure precision.*

*Acta Cryst.*D

**55**, 583–601.Google Scholar

Hodel, A., Kim, S.-H. & Brünger, A. T. (1992).

*Model bias in macromolecular crystal structures.*

*Acta Cryst.*A

**48**, 851–858.Google Scholar

Hooft, R. W. W., Sander, C. & Vriend, G. (1996

*b*).

*Positioning hydrogen atoms by optimizing hydrogen-bond networks in protein structures.*

*Proteins Struct. Funct. Genet.*

**26**, 363–376.Google Scholar

Jones, T. A. & Kjeldgaard, M. (1997).

*Electron density map interpretation.*

*Methods Enzymol.*

**277**, 173–208.Google Scholar

Jones, T. A., Kleywegt, G. J. & Brünger, A. T. (1996).

*Storing diffraction data.*

*Nature (London)*,

**381**, 18–19.Google Scholar

Jones, T. A., Zou, J.-Y., Cowan, S. W. & Kjeldgaard, M. (1991).

*Improved methods for building protein models in electron density maps and the location of errors in these models.*

*Acta Cryst.*A

**47**, 110–119.Google Scholar

Kleywegt, G. J. (1996).

*Use of non-crystallographic symmetry in protein structure refinement.*

*Acta Cryst.*D

**52**, 842–857.Google Scholar

Kleywegt, G. J., Bergfors, T., Senn, H., Le Motte, P., Gsell, B., Shudo, K. & Jones, T. A. (1994).

*Crystal structures of cellular retinoic acid binding proteins I and II in complex with all-trans-retinoic acid and a synthetic retinoid.*

*Structure*,

**2**, 1241–1258.Google Scholar

Kleywegt, G. J. & Brünger, A. T. (1996).

*Checking your imagination: applications of the free R value.*

*Structure*,

**4**, 897–904.Google Scholar

Kleywegt, G. J. & Jones, T. A. (1995

*b*).

*Where freedom is given, liberties are taken.*

*Structure*,

**3**, 535–540.Google Scholar

Kleywegt, G. J. & Jones, T. A. (1996

*b*).

*Phi/Psi-chology: Ramachandran revisited.*

*Structure*,

**4**, 1395–1400.Google Scholar

Kleywegt, G. J. & Jones, T. A. (1997).

*Model-building and refinement practice.*

*Methods Enzymol.*

**277**, 208–230.Google Scholar

Kleywegt, G. J. & Jones, T. A. (1998).

*Databases in protein crystallography.*

*Acta Cryst.*D

**54**, 1119–1131.Google Scholar

Kleywegt, G. J. & Read, R. J. (1997).

*Not your average density.*

*Structure*,

**5**, 1557–1569.Google Scholar

Ramakrishnan, C. & Ramachandran, G. N. (1965).

*Stereochemical criteria for polypeptide and protein chain conformations. II. Allowed conformations for a pair of peptide units.*

*Biophys. J.*

**5**, 909–933.Google Scholar

Read, R. J. (1986).

*Improved Fourier coefficients for maps using phases from partial structures with errors.*

*Acta Cryst.*A

**42**, 140–149.Google Scholar

Vaguine, A. A., Richelle, J. & Wodak, S. J. (1999).

*SFCHECK: a unified set of procedures for evaluating the quality of macromolecular structure-factor data and their agreement with the atomic model.*

*Acta Cryst.*D

**55**, 191–205.Google Scholar

Vriend, G. & Sander, C. (1993).

*Quality control of protein models: directional atomic contact analysis.*

*J. Appl. Cryst.*

**26**, 47–60.Google Scholar

Zou, J. Y. & Mowbray, S. L. (1994).

*An evaluation of the use of databases in protein structure refinement.*

*Acta Cryst.*D

**50**, 237–249.Google Scholar