Assessing the quality of macromolecular structures

Wodak, S. J.; Vagin, A. A.; Richelle, J.; Das, U.; Pontius, J.; Berman, H. M.

doi:10.1107/97809553602060000708

International
Tables for
Crystallography
Volume F
Crystallography of biological macromolecules
Edited by M. G. Rossmann and E. Arnold

pdf | chapter contents | chapter index | related articles

International Tables for Crystallography (2006). Vol. F. ch. 21.2, pp. 507-519 | 1 | 2 |
https://doi.org/10.1107/97809553602060000708

Chapter 21.2. Assessing the quality of macromolecular structures

S. J. Wodak,^a ^* A. A. Vagin,^b J. Richelle,^b U. Das,^b J. Pontius^b and H. M. Berman^c

^aUnité de Conformation de Macromolécules Biologiques, Université Libre de Bruxelles, avenue F. D. Roosevelt 50, CP160/16, B-1050 Bruxelles, Belgium, and EMBL–EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, England, ^bUnité de Conformation de Macromolécules Biologiques, Université Libre de Bruxelles, avenue F. D. Roosevelt 50, CP160/16, B-1050 Bruxelles, Belgium, and ^cDepartment of Chemistry, Rutgers University, 610 Taylor Road, Piscataway, NJ 08854-8087, USA
Correspondence e-mail: [email protected]

In this chapter, an overview is presented of the different types of validation procedures applied to proteins and nucleic acids. An approach to model validation based on atomic volumes embodied in the program PROVE is illustrated in some detail and the package SFCHECK, which combines a set of criteria for evaluating the quality of the experimental data and the agreement of the model with the data, is described.

Keywords: SFCHECK ; atomic resolution; coordinate errors; environment profiles; errors; knowledge-based interaction potentials; scaling; standard atomic volumes; structure validation.

21.2.1. Introduction

| top | pdf |

X-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy are the two major techniques that provide detailed information on the atomic structure of macromolecules. Usually, however, the data obtained from these techniques are not of high enough resolution to define the atomic positions of a macromolecule with sufficient precision. Deriving the atomic models from the experimental data therefore involves sophisticated optimization (refinement) procedures, in which constraints based on prior knowledge about the chemical structure of the molecule and its conformational properties are applied. The resulting models are therefore prone to errors, which fall into two broad categories: systematic errors caused by biases during the structure determination and refinement procedures, and random errors which affect the precision of the model. Moreover, the quality of the model can vary in different regions of the structure, often due to higher local conformational or thermal disorder in certain parts.

With the rapid growth in the number of structures of macromolecules determined and the spreading use of structural information in different areas of science, the availability of objective criteria and methods for evaluating the quality of these structures has become a very important requirement.

A variety of validation procedures have been proposed by many groups (for recent reviews, see MacArthur et al., 1994 , and Laskowski et al., 1998 ). The procedures involve two main approaches. One approach comprises procedures that validate the geometric and conformational parameters of the final model. This is done by measuring the extent to which the parameters deviate from standard values, derived from crystals of small molecules or from a set of high-quality structures of other macromolecules. The main limitation of this approach is that the quality of a model is defined by comparison with other known models without taking into account the experimental data. This harbours the danger of considering unusual conformations related to biological function as errors in the model or of accepting as `normal' only what has already been observed before. The second and most important approach by far comprises procedures that take into account the experimental data and evaluate the agreement of the atomic model with these data . These procedures can, in principle, evaluate systematic errors and biases that affect the global quality of the model and can also detect local imprecision. The most commonly cited measures of agreement between the model as a whole and the data are the R factor and the `free R factor', R_free. Criteria for evaluating the local agreement of the model with electron density on a per-atom or per-residue basis are also available, and, more recently, access to more powerful computers has made it possible to compute the standard uncertainties of individual parameters, such as atomic coordinates or thermal factors.

Finally, the growing number of atomic resolution structures – primarily of proteins – is starting to provide a valuable source of much more precise information about the structures' geometrical and conformational properties. This should contribute to the improvement of standard values used in validation.

In this chapter, we present an overview of the different types of validation procedures applied to proteins and nucleic acids. We illustrate, in some detail, an approach to model validation based on atomic volumes embodied in the program PROVE and describe the package SFCHECK, which combines a set of criteria for evaluating the quality of the experimental data and the agreement of the model with the data.

21.2.2. Validating the geometric and stereochemical parameters of the model

| top | pdf |

21.2.2.1. Comparisons against standard values derived from crystals of small molecules

| top | pdf |

This concerns the validation of the covalent geometry of the atomic model. It involves comparing the bond distances and angles of the macromolecule against standard values and their associated uncertainties, derived from crystal structures of small organic molecules available in the Cambridge Structural Database, CSD (Allen et al., 1979 , 1983 ).

The standard values derived in this way are also used as restraints in crystallographic refinement programs, such as XPLOR (Brünger, 1992a ) or the CCP4 suite of programs (Collaborative Computational Project, Number 4, 1994 ). As a result, the bond distances and angles of the final model usually agree well with their standard values, and the degree of scatter merely reflects the relative weight imposed on the various terms of the target function during refinement.

For proteins, the most commonly used standard values for the bond distances and angles are those compiled by Engh & Huber (1991) from molecular fragments in the CSD that most closely resemble chemical groups in amino acids. These parameters were shown to yield an improved description over that provided by the param19x.pro used in XPLOR, especially for the covalent geometry of aromatic rings in side-chain groups. It is noteworthy that these CSD-derived bond distances and angles can differ significantly from those used in molecular dynamics force fields , such as that of a recent version of CHARMM (MacKerell et al., 1998 ). In these force fields, covalent-geometry parameters are obtained by a different strategy. They are optimized together with non-bonded parameters against a large body of available energy and structural data for a limited set of compounds representing amino-acid building blocks.

Protein-structure validation packages, such as PROCHECK (Laskowski et al., 1993 ) and WHAT IF (Hooft, Vriend et al., 1996 ), flag all bond distances and angles that deviate significantly from the database-derived reference values. This includes analysis of the deviations from planarity in aromatic rings and planar side-chain groups.

Similar checks are performed for the covalent geometry of atomic models of RNA or DNA oligo- and polynucleotides . Here, standard ranges for bond distances and angles are derived from crystal structures of nucleic acid bases, mononucleosides and mononucleotides in the CSD (Clowney et al., 1996 ; Gelbin et al., 1996 ). These values are used in validation procedures developed by the Nucleic Acid Database (NDB) (Berman et al., 1992 ) and in crystallographic refinement programs. For higher-resolution structures (better than 2.4 Å), a standard geometry, dependent on the sugar pucker conformation (C2′ endo or C3′ endo) (Parkinson et al., 1996 ), is used.

Validation of the covalent geometry of the so-called `hetero groups ' (chemically modified monomer groups or small molecules that bind to macromolecules) is much more difficult. It therefore tends not to be routinely performed, and, as a result, the quality of the hetero groups in the models deposited in the Protein Data Bank (PDB) (Bernstein et al., 1977 ; Berman et al., 2000 ) varies widely.

The variety of the chemical structures of these molecules (the current release of the PDB contains about 2700 chemically distinct compounds) makes it difficult to archive them consistently, let alone to compile the dictionaries containing the required reference values in advance. Proper handling and verification of such groups require a comprehensive and rigorous description of the chemical components, as well as flexible means of deriving the appropriate reference geometries.

The development of systematic procedures for checking bond lengths and various torsion angles of hetero groups (Kleywegt & Jones, 1998 ) is a step in the right direction. Further progress should come, thanks in part to the recently adopted macromolecular Crystallographic Information File (mmCIF) format (Bourne et al., 1997 ), which provides the necessary framework for a much more comprehensive and rigorous description of the molecular components. Using this description as the basis, automated tools for building `customized' dictionaries of geometrical standards have been developed. One such tool is A LigAnd and Monomer Object Data Environment (A LA MODE ) (Clowney et al., 1999 ). It starts from a minimal topological description of a ligand or monomer component and performs the tasks required to construct the mmCIF component description. This includes querying the CSD, integration and book-keeping of database survey results, analysis and comparison of covalent geometry and stereochemistry, and the assembly of complex model structures from the results of multiple database surveys. Tools such as this considerably simplify the handling of small molecules at the refinement, validation and archiving stages.

21.2.2.2. Comparisons against standard values derived from surveys of other macromolecules

| top | pdf |

This involves computing a number of stereochemical, geometric and energy parameters from the atomic coordinates of the macromolecule and comparing them with standard ranges derived from high-quality crystal structures of other macromoleules. These standards represent the `expected' properties, and the aim is to evaluate the quality of a model by measuring the extent to which it departs from these properties.

This evaluation is usually performed at the global level, in order to assess the quality of the structure as a whole, and on the local level, to identify specific regions with unusual properties. Such regions may represent genuine problems with the model or unusual conformations adopted for functional purposes, and it is sometimes difficult to distinguish between these two alternatives. The choice of reference structures from which the standards are derived is a crucial aspect of the approach, since both the mean and shape of the reference distributions may be affected by it.

21.2.2.2.1. Validation of stereochemical and non-bonded parameters

| top | pdf |

Morris et al. (1992) pioneered this type of validation for proteins. The software PROCHECK (Laskowski et al., 1993 ), which implements and extends this approach, is described in detail in Chapter 25.2 of this volume. A very important evaluation criterion is the Ramachandran-plot quality, where the distribution of the backbone φ, ψ angles of a given protein structure is compared to that in high-quality structures. The comparison is performed both globally, by determining the proportion of the residues in favourable (core) regions of the plot, and locally, by the log-odds (G-factor) value, which measures how normal or unusual a residue's location is in the plot for a given residue type.

A similar strategy is used to evaluate other stereochemical parameters, such as the side-chain torsion angles ( $[\chi_{1}]$ , $[\chi_{2}]$ , $[\chi_{3}]$ etc.), the peptide bond torsion (ω), the C^α tetrahedral distortion, disulfide bond geometry and stereochemistry.

An evaluation of the backbone hydrogen-bonding energy is also performed, using the Kabsch & Sander (1983) algorithm, by comparison with distributions computed from high-resolution protein structures.

Other programs like WHAT IF (Hooft, Vriend et al., 1996 ) perform similar evaluations. This program computes the expected φ, ψ distribution for each residue type from a data set of non-redundant high-quality structures and evaluates how the φ, ψ distribution of a given protein deviates from the expected values (Hooft et al., 1997 ). A somewhat different version of this approach is proposed by Kleywegt & Jones (1996). WHAT IF also computes other quality indicators such as the number of buried unsatisfied hydrogen bonds or the extent of the overlap of van der Waals spheres (`clashes'). In addition, it verifies the orientation of His, Gln and Asn side chains, based on a hydrogen-bond network analysis, which also takes into account hydrogen bonds between symmetry-related molecules (Hooft, Sander & Vriend, 1996 ).

The very small fraction of structures (< 1.3%) for which only the C^α coordinates are deposited cannot be validated by the standard techniques. For these structures, two sets of parameters were shown to be useful (Kleywegt & Jones, 1996 ). They are the C^α—C^α distances and a Ramachandran-like plot which displays for each residue the $[\hbox{C}^{\alpha}_{i-1} \hbox{---C}^{\alpha}_{i} \hbox{---C}^{\alpha}_{i+1} \hbox{---C}^{\alpha}_{i+2}]$ dihedral angle against the $[\hbox{C}^{\alpha}_{i-1} \hbox{---C}^{\alpha}_{i} \hbox{---C}^{\alpha}_{i+1}]$ angle. Deviations from the expected distributions of these parameters, computed from a set of high-quality complete protein structures, are used as quality indicators.

The validation of nucleic acid stereochemistry, in particular DNA, has a much shorter history. Only in recent years has the number of high-quality nucleic acid crystal structures become large enough to permit the derivation of reliable conformational trends. Schneider et al. (1997) derived ranges and mean values for the torsion angles of the sugar–phosphate backbone in helical DNA from a set of 96 oligodeoxynucleotide crystal structures. These ranges form the basis for the nucleic acid structure validation protocols currently implemented at the NDB.

21.2.2.2.2. Validation using knowledge-based interaction potentials and profiles

| top | pdf |

These methods represent a distinct set of approaches to the validation of the non-bonded and conformational parameters of the model. They involve computing the relative frequencies of residue–residue or atom–atom contacts from a set of high-quality protein structures and evaluating how the contacts in a given protein deviate from these standard frequencies. Most often, these frequencies are translated into potentials (energies) using the Boltzmann relation (Sippl, 1990 ), and these `knowledge-based' potentials are used to score the structure (for a review, see Wodak & Rooman, 1993 ). The potentials that consider residue–residue interactions, as in the software PROSA II (Sippl, 1993 ), are usually quite crude since each residue is represented by a single interaction centre. They can therefore detect only gross errors in chain tracing or identify incorrectly modelled segments in an otherwise correct structure, but can not validate detailed atomic positions. The same limitation applies to procedures based on three-dimensional (3D) environment profiles (Eisenberg et al., 1997 ). The latter consider the relative frequencies of finding each of the 20 amino acids in a given local 3D environment defined by the residue buried area, the ratio of polar versus non-polar neighbours and the secondary structure. The corresponding energies are used to score the compatibility of a structure with its amino-acid sequence in a manner similar to the residue–residue interaction potentials.

Finally, validation procedures based on the relative frequencies of atom–atom interactions in known protein structures have also been developed (Melo & Feytmans, 1997 , 1998 ). These methods, consolidated in the software ANOLEA , are capable of identifying local errors and problems of sequence misalignment in protein structures built by homology modelling. In addition, energy Z scores computed with these potentials for whole protein structures correlate well with the resolution of the X-ray data, as shown below for the volume-based Z scores.

21.2.2.2.3. Deviations from standard atomic volumes as a quality measure for protein crystal structures

| top | pdf |

The observations that protein X-ray structures are at least as tightly packed as small-molecule crystals (Richards, 1974 ; Harpaz et al., 1994 ) and that the packing density inside proteins displays very limited variation (Richards, 1974 ; Finney, 1975 ) suggest that atomic volumes or measures of atomic packing can be added to the list of parameters for assessing the quality of protein structures.

Packing and related measures have been used to compare structures of proteins derived by both X-ray diffraction and NMR spectroscopy. Ratnaparkhi et al. (1998) analysed pairs of protein structures for which both crystal and NMR structures were available. They found that the packing values of the NMR models displayed a much larger scatter than those of the corresponding crystal structures, suggesting that this is probably due to the fact that accurate values of the packing density cannot, at present, be obtained from NMR data. Similar conclusions were reached using measures of residue–residue contact area (Abagyan & Totrov, 1997 ).

Here, we describe the approach of Pontius et al. (1996), in which deviations from standard atomic volumes are used to assess the quality of a protein model, both overall and in specific regions.

The volumes occupied by atoms and residues inside proteins can be readily computed using the Voronoi method (Voronoi, 1908 ), first applied to proteins by Richards (1974) and Finney (1975). This method uses the atomic positions of the molecular model, and the volume assigned to each atom is defined as the smallest polyhedron created by the set of planes bisecting the lines joining the atom centre to those of its neighbours (Fig. 21.2.2.1).

Figure 21.2.2.1| top | pdf |

The Voronoi polyhedron. (a) Positioning of the dividing plane P between two atoms i and j, with van der Waals radii [r_i] and [r_j] , respectively, separated by a distance d. The plane P is positioned at d/2. (b) 2D representation of the Voronoi polyhedron of the central atom. This polyhedron is the smallest polyhedron delimited by all the dividing planes of the atom.

The use of the classical Voronoi procedure is justified in the context of validation because it avoids the need to derive a consistent set of van der Waals radii for atoms in the system. Such sets are used by other volume-calculation methods in order to partition space more accurately (Richards, 1974 , 1985 ; Gellatly & Finney, 1982 ). Assigning a consistent set of radii to protein atoms is, indeed, not straightforward due to the heterogeneity of the interactions within the protein (polar, ionic, non-polar) and the presence of a large variety of hetero groups.

Structure-quality assessment based on volume calculations involves computing the atomic volumes in a subset of highly resolved and refined protein structures and analysing the distributions of these volumes for different atomic types, defined according to their chemical nature and bonded environment. These distributions define the expected ranges (mean and standard deviation) for the volume of each category of atoms. Atomic volumes in a given structure are then compared to the expected ranges, and statistically significant deviations from these ranges are flagged.

The program PROVE (Pontius et al., 1996 ) implements such an approach using the analytic algorithms for volume and surface-area calculations encoded in SurVol (Alard, 1991 ). It computes for each atom i in a structure its volume Z score $[(Z\; \hbox{score} = \big|V_{i}^{k} - \overline{V^{k}}\big|\big/\sigma^{k})]$ , where the superscript k designates the particular atom type (e.g., the C^α atom in a Leu residue), and $[\overline{V^{k}}]$ and $[\sigma^{k}]$ are, respectively, the mean and standard deviation of the reference volume distribution for the corresponding atom type. These reference distributions are derived from a set of high-quality protein crystal structures using exactly the same calculation procedure (Pontius et al., 1996 ).

Atoms with absolute Z scores > 3 are flagged as possible problem regions in the protein model, and residues containing such atoms are highlighted on graphical plots of the same type as those used by the PROCHECK program and on molecular models displayed using programs such as Rasmol (Sayle & Milner-White, 1995 ).

In addition to the validation of the local quality of the model, its overall quality can be assessed by the root-mean-square volume Z score of all its atoms (see Fig. 21.2.2.2 for definition). As for many stereochemical global quality indicators, this Z score shows good correlation with the nominal resolution (d spacing) of the crystallographic data, as illustrated in Fig. 21.2.2.2(a). This figure also shows that Z-score ranges can be defined for each resolution interval. The Z scores of individual proteins that lie outside these intervals may be indicative of `problem' structures. This is clearly the case for the two proteins 2ABX and 2GN5, whose Z scores are much higher than average (Fig. 21.2.2.2b).

Figure 21.2.2.2| top | pdf |

Atomic volume Z score r.m.s. variation with nominal resolution (d spacing) in 900 protein structures from the PDB. (a) Average of the r.m.s. volume Z score computed for structures having the same resolution (to within $[\pm 0.1]$ Å). The vertical bars indicate the magnitude of the standard deviations of the r.m.s. volume Z score in individual d-spacing bins. Graph points are derived from less than 10 structures (open diamonds) and from more than 10 structures (filled diamonds). (b) R.m.s. Z-score values as in (a), displayed for individual structures as a function of resolution. The five furthest outlier proteins are marked by their PDB codes.

Since the Voronoi volume of solvent-accessible atoms cannot be defined, because these atoms are not completely surrounded by other atoms, only completely buried atoms are scored.

The current version of PROVE is unable to measure the deviations from standard volumes for atoms in nucleic acids or hetero groups, simply because of the lack of reference volumes for these structures. This should change in the near future, at least for nucleic acids, thanks to the growing number of high-quality nucleic acid crystal structures from which standard volume ranges could be readily derived.

21.2.3. Validation of a model versus experimental data

| top | pdf |

By far the most important measure of the quality of a given atomic model is its agreement with the experimental data. This type of validation is geared towards detecting systematic errors, which determine the overall accuracy of the model, and random errors, which affect the precision of the model.

Systematic errors are difficult to detect even in highly refined structures, especially at lower resolution. The most commonly used measures of the agreement between the atomic coordinates and the X-ray data are the classical R factor and the `free R factor ' (R_free) (Brünger, 1992b ). The latter is based on standard statistical cross-validation techniques (Brünger, 1997 ) and is therefore less amenable to manipulation, such as leaving out weak data or over-fitting the data with too many parameters. Currently, nearly half of the publications on macromolecular structures report R_free values, an indication that its use is becoming more widespread. So far, however, there are no clear guidelines indicating what an `acceptable' R_free value should be (Kleywegt & Brünger, 1996 ).

An expression for estimating the expected R_free value has been proposed (see Dodson et al., 1996 ) and used to assess the significance of the drop in R_free during refinement. Accurate expressions for the expected ratio of R_free to R (the R_free ratio) have also been derived theoretically (Tickle et al., 1998 ). This ratio seems to be independent of random errors and can be used to detect systematic errors at the convergence of the least-squares refinement. The remaining problem is to determine what the precision of R_free or the R_free ratio should be. In other words, if the R_free ratio differs from the expected value, when is the difference significant? This requires knowing the variance of these parameters. Estimating the precision of R_free can be done empirically by performing repeated refinements of the same structure with different sets of reflections removed (Brünger, 1997 ). From such analysis, a useful approximation to the R_free precision was suggested to be the ratio $[R_{\rm free}/(n)^{1/2}]$ , where n is the number of reflections in the test set.

Evaluating the precision of the refined parameters, that is, the atomic coordinates and the temperature or B factors, is a different matter. In small-molecule crystallography, the standard uncertainty (s.u.) of the parameters can be computed from the variance–covariance matrix, obtained by inverting the full normal-equations matrix (Cruickshank, 1965 ). This can, in principle, also be done for the parameters of macromolecules. However, the number of second derivatives to be computed and the size of the matrix to be inverted are so large that this task is too time consuming to be performed routinely. This is gradually changing, however. An increasing number of proteins structures, primarily those solved at atomic resolution, have their s.u.'s computed in this manner (Deacon et al., 1997 ; Harata et al., 1998 ). A program often used for this purpose is SHELXL (Sheldrick & Schneider, 1997 ), a well known refinement software package for small molecules that has recently been extended to proteins. Availability of s.u.'s can determine the dependence of the precision of the atomic coordinates on various factors, such as the resolution, the atomic number, and the number and types of restraints used during refinement (Tickle et al., 1998 ).

Other methods for determining the relative precision of atoms in macromolecular structures involve calculating the agreement between the model and the electron density in specific regions. The newer approach by Zhou et al. (1998) is related to the real-space R factor of Jones et al. (1991), but differs from it by the way in which the electron density is computed (Chapman, 1995 ).

As our understanding of the factors that govern the systematic errors in macromolecular crystallography increases and our ability to detect random errors improves, the possibility of devising systematic and possibly more automatic protocols for assessing the agreement between the model and the data will emerge.

In what follows, we describe the software package SFCHECK (Vaguine et al., 1999 ), which can be regarded as a first attempt in this direction. This software computes and summarizes many of the commonly used measures for evaluating the quality of the structure-factor data and the agreement of the model with these data.

We summarize the tasks performed and the quality indicators computed by SFCHECK and briefly illustrate how this software can be used to evaluate individual structures and survey different structures.

21.2.3.1. A systematic approach using the SFCHECK software

| top | pdf |

21.2.3.1.1. Tasks performed by SFCHECK

| top | pdf |

21.2.3.1.1.1. Treatment of structure-factor data and scaling

| top | pdf |

SFCHECK reads in the structure-factor data written in mmCIF format. It then performs the following operations: Reflections are excluded if they are systematically absent, negative, or have flagged σ values (99.9). Equivalent reflections are merged. The amplitudes of missing reflections are approximated by taking the average value for the corresponding resolution shell.

From the model coordinates read from the PDB (or mmCIF) atomic coordinates file, SFCHECK calculates structure factors and scales them to the observed structure factors. The scaling factor, S, is computed using a smooth cutoff for low-resolution data (Vaguine et al., 1999 ) (Table 21.2.3.1). This involves the calculation of the observed and calculated overall B factors from the standard deviations of the Gaussian fitted to the Patterson origin peaks [see Table 21.2.3.1 and Vaguine et al. (1999)]. In addition, SFCHECK also estimates the overall anisotropy of the data, following the approach of Sheriff & Hendrickson (1987), and applies the anisotropic scaling after the Patterson scaling is performed (Murshudov et al., 1998 ).

Table 21.2.3.1| top | pdf |
Parameters computed for the analysis of the structure-factor data

The first column lists the parameter, the second column gives the formula or definition of the parameter and the third column contains a short description of the meaning of the parameters when warranted.

Parameter	Formula/definition	Meaning
Completeness (%)	Percentage of the expected number of reflections for the given crystal space group and resolution
B_overall (Patterson)	$[8\pi^{2} \sigma_{\rm Patt}/(2)^{1/2}]$ ^†	Overall B factor
R_stand(F)	$[\langle \sigma (F)\rangle/\langle F \rangle]$ ^‡	Uncertainty of the structure-factor amplitudes
Optical resolution	$[(\sigma_{\rm Patt}^{2} + \sigma_{\rm sph}^{2})^{1/2}]$ ^† ^§	Expected minimum distance between two resolved atomic peaks
Expected optical resolution	Optical resolution computed considering all reflections
$[\hbox{CC}_{F}]$	$[\displaystyle{\langle F_{\rm obs} F_{\rm calc}\rangle - \langle F_{\rm obs}\rangle\langle F_{\rm calc}\rangle \over \left[(\langle F_{\rm obs}^{2} \rangle - \langle F_{\rm obs}\rangle^{2}) (\langle F_{\rm calc}^{2}\rangle - \langle F_{\rm calc}\rangle^{2})\right]^{1/2}}]$	Correlation coefficient between the observed and calculated structure-factor amplitudes
S	$[\left\{{\textstyle\sum\displaystyle (F_{\rm obs} f_{\rm cutoff})^{2} \over \textstyle\sum\displaystyle \left[F_{\rm calc} \exp (- B_{\rm diff}^{\rm overall} s^{2}) f_{\rm cutoff}\right]^{2}}\right\}^{1/2}]$ ^¶	Factor applied to scale $[F_{\rm calc}]$ to $[F_{\rm obs}]$
$[f_{\rm cutoff}]$	$[1 - \exp (- B_{\rm off} s^{2})]$ ^††	Function applied to obtain a smooth cutoff for low-resolution data

^† $[\sigma_{\rm Patt}]$ is the standard deviation of the Gaussian fitted to the Patterson origin peak.
^‡F is the structure-factor amplitude, and $[\sigma({F})]$ is the structure-factor standard deviation. The brackets denote averages.
^§ $[\sigma _{\rm sph}]$ is the standard deviation of the spherical interference function, which is the Fourier transform of a sphere of radius $[1/d_{\min}]$ , with $[d_{\rm min}]$ being the minimum d spacing.
^¶ $[B_{\rm diff}^{\rm overall} = B_{\rm obs}^{\rm overall} - B_{\rm calc}^{\rm overall}]$ is added to the calculated overall B factor, $[B_{\rm overall}]$ , so as to make the width of the calculated Patterson origin peak equal to the observed one; s is the magnitude of reciprocal-lattice vector.
^†† $[B_{\rm off} = 4 d_{\rm max}^{2}]$ , where s and $[d_{\rm max}]$ , respectively, are the magnitude of the reciprocal-lattice vector and the maximum d spacing.

To assess the quality of the structure-factor data, the program computes four additional quantities (see Table 21.2.3.1 for details): the completeness of the data, the uncertainty of the structure-factor amplitudes, the optical resolution and the expected optical resolution. The latter two quantities represent the expected minimum distance between two resolved atomic peaks in the electron-density map when the latter is computed with the set of reflections specified by the authors and with all the reflections, respectively.

21.2.3.1.1.2. Global agreement between the model and experimental data

| top | pdf |

To evaluate the global agreement between the atomic model and the experimental data, the program computes three classical quality indicators: the R factor, $[R_{\rm free}]$ (Brünger, 1992b ) and the correlation coefficient $[\hbox{CC}_{F}]$ between the calculated and observed structure-factor amplitudes (Table 21.2.3.1). The R factor is computed using all the reflections considered (except those approximated by their average value in the corresponding resolution shell) and applying the same resolution and σ cutoff as those reported by the authors. $[R_{\rm free}]$ is computed using the subset of reflections specified by the authors. In addition, the R factor is evaluated using the `non-free' subset of reflections (those not used to compute $[R_{\rm free}]$ ). The correlation coefficient is computed using all reflections from the reported high-resolution limit, applying the smooth low-resolution cutoff (see Table 21.2.3.1) but no σ cutoff.

21.2.3.1.1.3. Estimations of errors in atomic positions

| top | pdf |

The errors associated with the atomic positions are expressed as standard deviations (σ) of these positions. SFCHECK computes three different error measures. One is the original error measure of Cruickshank (1949). The second is a modified version of this error measure, in which the difference between the observed and calculated structure factors is replaced by the error in the experimental structure factors. The first two error measures are the expected maximal and minimal errors, respectively, and the third measure is the diffraction-component precision indicator (DPI). The mathematical expressions for these error measures are given in Table 21.2.3.2 , and further details can be found in Vaguine et al. (1999).

Table 21.2.3.2| top | pdf |
Estimation of errors in atomic coordinates

Parameter	Formula/definition	Meaning
$[\sigma(x)]$	$[\displaystyle{\sigma\hbox{(slope)} \over \hbox{curvature}}]$ ^†	Standard deviation of the atomic coordinates following Cruickshank (1949) for the minimal and maximal errors (Vaguine et al., 1999 )
σ(slope) for maximal error	$[\displaystyle{2\pi \left\{\textstyle\sum\displaystyle \left[h^{2} (F_{\rm obs} - F_{\rm calc})^{2}\right]\right\}^{1/2} \over V_{\rm unit \ cell} a}]$ ^‡	Expression for σ(slope) in the expected maximal error following Cruickshank (1949)
Curvature	$[\displaystyle{2\pi \textstyle\sum\displaystyle (h^{2} F_{\rm obs}) \over V_{\rm unit \ cell} a^{2}}]$	Expression for the curvature following Murshudov et al. (1997)
σ(slope) for minimal error	$[\displaystyle{2\pi^2 \left\{\textstyle\sum\displaystyle \left[h^{2} \sigma (F_{\rm obs})^{2}\right]\right\}^{1/2} \over V_{\rm unit \ cell} a}]$ ^§	Expression for σ(slope) in the expected minimal error, following Cruickshank (1949)
DPI	$[\displaystyle{\sigma (x) = \left({N_{\rm atoms} \over N_{\rm obs} - 4 N_{\rm atoms}}\right)^{1/2} c^{-1/3} d_{\min} R}]$ ^¶	Atomic coordinate error estimate following Cruickshank (1996)

^†σ(slope) and curvature are the slope and curvature of the electron-density map at the atomic centre, in the x direction, for spherically symmetric peaks; $[\sigma (x)\simeq \sigma(y)\simeq \sigma(z)]$ .
^‡a is the crystal unit-cell length, h is the Miller index and V _{unit cell} the unit-cell volume.
^§ $[\sigma(F_{\rm obs})]$ is the standard deviation of the structure-factor amplitude.
^¶c is the structure-factor data completeness expressed as a fraction (0–1), R is the conventional R factor, $[N_{\rm atoms}]$ is the total number of atoms in the unit cell, $[N_{\rm obs}]$ is the total number of observed reflections and $[d_{\rm min}]$ is the minimum d spacing.

21.2.3.1.1.4. Local agreement between the model and the experimental data

| top | pdf |

In addition to the global structure quality measures, SFCHECK also determines the quality of the model in specific regions. Several quality estimators can be calculated for each residue in the macromolecule and, whenever appropriate, for solvent molecules and groups of atoms in ligand molecules. These estimators are the normalized atomic displacement (Shift), the correlation coefficient between the calculated and observed electron densities (Density correlation), the local electron-density level (Density index), the average B factor (B-factor) and the connectivity index (Connect), which measures the local electron-density level along the molecular backbone. These quantities are computed for individual atoms and averaged over those composing each residue or group of atoms [see Table 21.2.3.3 and Vaguine et al. (1999) for details].

Table 21.2.3.3| top | pdf |
Parameters computed by SFCHECK to assess the quality of the model in specific regions

Parameter	Formula/definition	Meaning
Shift	$[(1/N\sigma)\textstyle\sum\limits_{i}^{N}\displaystyle \Delta_{i},\hbox{ with } \Delta_{i} = (\hbox{gradient}_{i}/\hbox{curvature}_{i})]$ ^†	Normalized average atomic displacement computed over a group of atoms or residue; reflects the tendency of the group of atoms to move from their current position
Density correlation	$[\displaystyle{\textstyle\sum\displaystyle \rho_{\rm calc}(x_{i})[2\rho_{\rm obs}(x_{i}) - \rho_{\rm calc}(x_{i})] \over \left(\left[\textstyle\sum\displaystyle \rho_{\rm calc}^{2} (x_{i})\right]\left\{\textstyle\sum\displaystyle \left[2\rho_{\rm obs}(x_{i}) - \rho_{\rm calc}(x_{i})\right]^{2}\right\}\right)^{1/2}}]$ ^‡	Electron density correlation coefficient computed over a group of atoms or residue; reflects the local agreement of the model with the electron density
Density index	$[\left[\textstyle\prod\displaystyle \rho(x_{i})\right]^{1/N}/\langle \rho \rangle_{\rm all \ atoms}]$ ^§	Reflects the level of the electron density for a group of atoms; is a local measure of the density level
Connect		Same as Density index, but considering only backbone atoms.^¶

^†Gradient _i is the gradient of the $[F_{\rm obs} - F_{\rm calc}]$ map with respect to the atomic coordinates, curvature _i is the curvature of the model map computed at the atomic centre (see Agarwal, 1978

), N is the number of atoms in the group considered and σ is the standard deviation of the $[\Delta_{i}]$ values computed in the structure.
^‡ $[\rho_{\rm calc}(x_{i})]$ and $[\rho_{\rm obs}(x_{i})]$ are, respectively, the electron density computed from calculated and observed structure-factor amplitudes at the atomic centre. The summation is performed over all the atoms in the group considered. For polymer residues, D_corr is computed separately for backbone and side-chain atoms. For the calculation of the electron density at the atomic centre, see Vaguine et al. (1999)

.
^§ $[[\prod{\rho (x_{i})}]^{1/N}]$ is the geometric mean of the $[2F_{\rm obs} - F_{\rm calc}]$ electron density of the atom subset considered and $[\langle \rho \rangle_{\rm all \ atoms}]$ is the average electron density of the atoms in the structure. For water molecules or ions which are represented by a unique atom, the above expression reduces to the ratio $[\rho(x_i)/\langle \rho \rangle_{\rm all \ atoms}]$ .
^¶Backbone atoms are N, C, C^α for proteins and P, O5′, C5′, C3′, O3′ for nucleic acids.

21.2.3.1.2. Evaluation of individual structures

| top | pdf |

Figs. 21.2.3.1 –21.2.3.3 summarize the analysis carried out by SFCHECK on the protein rusticyanin from Thiobacillus ferrooxidans (1RCY) (Walter et al., 1996 ). Fig. 21.2.3.1 displays the numerical results from the analysis of the structure-factor data and from the evaluation of the global agreement between the model and the data. The R-factor and $[R_{\rm free}]$ values, computed by SFCHECK (Model vs. Structure Factors panel) using the identical reflection subset to that reported by the authors (Refinement panel), show negligible differences with the reported values. These differences are 0.175 versus 0.172 for the R factor and 0.25 versus 0.243 for $[R_{\rm free}]$ . The small R-factor difference may stem from the fact that SFCHECK considers a somewhat different number of reflections (9144) than the authors (9098), although it uses the same d-spacing range and σ cutoff as those reported.

Figure 21.2.3.1| top | pdf |

Typical SFCHECK output in PostScript format, illustrated for the protein rusticyanin from Thiobacillus ferrooxidans (1RCY) (Walter et al., 1996 ). Summary panels displaying the numerical results from the analysis of the deposited structure-factor data and from the evaluation of the global agreement between the model and these data. The top elongated panel lists the PDB title record, deposition date and PDB code. The Crystal panel summarizes the crystal parameters, provided by the authors, as read from the model input files. The Model and Refinement panels list the information provided by the authors on the model and the refinement procedure, respectively. This information is read from the PDB coordinates entry. The Structure Factors panel summarizes the information on the deposited structure-factor data (Input section) and on the data used and criteria computed by SFCHECK (SFCHECK section). The numbers given under `Anisotropic distribution of Structure Factors' are the ratios of the eigenvalues of the symmetric anisotropic thermal tensor to the maximum eigenvalues. The Model vs. Structure Factors panel summarizes the results of the verifications made by SFCHECK. The values listed under `Anisothermal Scaling (Beta)' are those of the overall anisotropic thermal tensor ( $[b_{11}, b_{12}, b_{13}, b_{22}, b_{23}, b_{33}]$ ). The meanings of other listed quantities are either self-explanatory or are described in the text.

Figure 21.2.3.2| top | pdf |

Graphical output from the SFCHECK analysis of global characteristics of the structure-factor data and the model agreement with those data for the same structure as in Fig. 21.2.3.1 . From left to right and top to bottom: the Wilson plot; the behaviour of the optical resolution as a function of the nominal resolution (d spacing); the data completeness and structure-factor standard error as a function of the d spacing; the maximal and minimal coordinate error dependence on d spacing; a stereographic projection of the averaged radial structure-factor data completeness; and, finally, the R-factor dependence and Luzzati plots for a given atomic error.

Figure 21.2.3.3| top | pdf |

SFCHECK evaluation summary of the local agreement between the model and the electron density for the same structure as in Fig. 21.2.3.1 . Five criteria are plotted for each residue of the macromolecule (designated by its one-letter code), as well as for each solvent molecule (w), or hetero group. These criteria are: (1) Shift, (2) Density correlation, (3) Density index, (4) B factor, (5) Connect. The definitions of these criteria are given in the text. Note that the values of the Connect parameter are truncated to a maximum of 1. The SFCHECK output shown in Figs. 21.2.3.1 –21.2.3.3 was generated using routines from PROCHECK kindly provided by R. Laskowski.

The information in Figs. 21.2.3.1 and 21.2.3.2 allows one to make some judgement about the quality of the structure-factor data for this protein. The relatively high resolution of this structure (1.9 Å) is accompanied by limited data completeness (82.1%). The Rstand(F) plot on the same graph shows, furthermore, a decrease in quality of the high-resolution data (2.2–1.9 Å). The average radial completeness plot (bottom left-hand plot of Fig. 21.2.3.2 ) allows one to identify the regions in reciprocal space with incomplete data.

Fig. 21.2.3.3 presents the SFCHECK analysis of the local agreement of the model with the electron density for 1RCY. The shift plot shows that both backbone and side-chain shifts are of comparable size, with several residues (1, 2, 16, 25) displaying shifts as high as 0.16 Å. The density correlation is excellent throughout the entire molecule, except for residues 2, 16 and 29, which display poorer correlation. In particular, the side chains of these residues seem to be more poorly defined in the electron-density map. The backbone density index plunges in a few regions, notably at the N-terminus (residues 5–7) and in the segments comprising residues 25–30 and 68–70. The side chains display, in general, a poorer density index than the backbone, with some regions (for example, residues 5–7, 23–30, 58–60) displaying rather low density indices. The same segments also display higher backbone and side-chain B factors. The backbone Connect parameter is, on the other hand, quite good throughout, except for residues 5–7 and 28–29 (Fig. 21.2.3.3).

Water molecules (labeled w in the SFCHECK output) are also evaluated. The relevant plots for these molecules are those of the Shift, Density index and B factor parameters. The first 50 or so water molecules in the list (appearing sequentially along the plot from left to right) tend to display a higher density index and lower B factors (< 30 Å²) than the following molecules in the list. They thus seem to be more reliably positioned than subsequent molecules, whose density indices sometimes drop perilously. A steady climb of the B factors is also apparent as one goes down the list of water molecules. The analysis of the density indices and B factors of individual water molecules performed by SFCHECK could be a very useful guide in investigations of the properties of crystallographic water molecules and their interactions with protein atoms.

21.2.3.1.3. Quality assessment based on surveys across structures

| top | pdf |

21.2.3.1.3.1. Assessing the quality of a structure as a whole

| top | pdf |

As for the evaluation of the geometric and stereochemical parameters of the model, surveying the same quality indicators across many structures is crucial. It allows one to establish the ranges of expected values for each indicator and to identify structures with unexpected features – those for which the values of one or more quality indicators are outside their standard range.

The global quality indicators computed by SFCHECK are the nominal resolution (d spacing), the R factor, $[R_{\rm free}]$ , the minimal and maximal errors in atomic positions, the DPI, and the correlation coefficient $[\hbox{CC}_{F}]$ . Another type of global quality indicator can be obtained by computing the average values of local quality measures across a given structure. This can be done for the per-residue (or per-group) atomic displacement and the Density correlation and B factor parameters as well as for the Density index and Connect parameters.

Many of the geometric and stererochemical quality indicators vary as a function of resolution – some linearly and some not (Laskowski et al., 1993 ). This is also the case for most of the global quality indicators described here. Examples of this dependence are given in Fig. 21.2.3.4 , which shows how the correlation coefficient, the maximal error, the average atomic displacement and average density index vary as a function of resolution in the 104 nucleic acid structures surveyed. This variation is approximately linear for all four parameters. The density correlation and average density index decrease, whereas the maximal error and average atomic displacements increase, as the resolution gets poorer. In all four plots of Fig. 21.2.3.4 , the points tend to display significant scatter as the d spacing increases, and at least three points, corresponding to the same three structures, appear as outliers in all plots. These structures also appear as outliers in the analysis of other parameters. A closer examination revealed that in the vast majority of the cases, the abnormal behaviour of these structures could be traced back to problems with data formats or errors that occurred during data deposition and entry processing.

Figure 21.2.3.4| top | pdf |

Variation of global quality indicators with the nominal resolution (d spacing) of the crystallographic data. The following quality indicators were computed by SFCHECK for each of the 104 nucleic acid crystal structures considered in the study of Das et al. (2001 ): (a) correlation coefficient, (b) maximal error, (c) average atomic displacement and (d) average density index. For the meaning of the various quantities see Table 21.2.3.2 . The three structures for which the reported and re-computed R factors differ by more than 10% are highlighted as black circles. The NDB (PDB) codes for these structures are ADFB72 (256D), ADF073 (257D) and ADJ081 (320D).

As the number of structures with deposited structure-factor data becomes large enough, plots such as those of Fig. 21.2.3.4 could be used to define the expected range of values for a quality indicator in a structure determined at a given resolution or refined under given conditions. Structures yielding quality indicators outside this range could then be identified as unusual on a more solid statistical basis.

21.2.3.1.3.2. Assessing the quality in specific regions of a model

| top | pdf |

The main purpose for computing the four local quality measures, the B factor, the Density index, the atomic displacement (Shift) and the Density correlation (Table 21.2.3.3), is to identify problem regions in a model. In order to do this effectively, it is necessary to evaluate the degree of redundancy between these measures and to establish the standard ranges for their values. The latter task, in particular, is not straightforward since it depends crucially on the quality of the experimental data and biases introduced by the scaling procedure and refinement protocol. In this regard, several issues are presently still under investigation.

A preliminary investigation of the mutual relations between the above-mentioned local measures has been performed in several protein and nucleic acid structures taken individually. This shows that that the B factor is strongly correlated with the density index, as illustrated in Fig. 21.2.3.5(a), and to a lesser extent with the atomic displacement (Fig. 21.2.3.5b). A weaker correlation was detected between the latter three measures and the residue density correlation (data not shown).

Figure 21.2.3.5| top | pdf |

Pairwise correlations between the various local quality indicators computed by SFCHECK. (a) Correlation between the average residue B factor and the density index, and (b) between the B factor and the atomic displacement. The values displayed were computed for residues in the crystal structure of carboxypeptidase (1YME). The meaning of the parameters displayed is given in Table 21.2.3.3 .

Analyses across structures could, in principle, be carried out for all four local measures computed by SFCHECK, provided these measures are not subject to systematic biases due to differences in scaling procedures and refinement practices. Such biases are, however, well known for the B factors of individual atoms or residues. This is illustrated in Fig. 21.2.3.6(a). This figure plots, side-by-side, the average residue B factors in 21 protein structures determined at different d spacings. It shows that for proteins determined at poorer resolution (d spacing above 2 Å), the B factors of different structures are systematically shifted relative to one another. Such systematic shifts are much smaller for structures determined at 2 Å resolution or better (Fig. 21.2.3.6a). This is not surprising, since in lower-resolution structures, $[N_{\rm refl}/N_{\rm atoms}]$ is often too low (< 4) to yield meaningful values for the B factors.

Figure 21.2.3.6| top | pdf |

B factors and density indices for residues across different structures. (a) Average B values in residues of 21 protein structures; (b) average density indices of the same set of residues and structures. The 21 protein structures analysed are from the following PDB entries: 1YME, 1MCT, 1PDO, 1VHH, 1WBA, 1CNS, 1RG7, 1UCO, 1BRO, 1EMB, 1FXI, 1KBA, 1XSM, 1HIB, 1IVF, 1QRS, 1AGX, 1NSN, 1ZOO, 1TGK, 1JCK.

Interestingly, the residue Density index, a very different parameter from the B factor, which measures the level of electron density at the atomic positions, does not display the systematic shifts observed for the B factors (Fig. 21.2.3.6b), despite the fact that the two measures are rather strongly correlated in individual structures. An indicator such as this one, and ultimately the atomic s.u.'s themselves, should be better suited for analysing and comparing the trends in the quality of specific regions of the model across different structures.

21.2.4. Atomic resolution structures

| top | pdf |

With improved techniques of crystallization and data collection using synchrotron radiation and cryogenic cooling, an increasing number of protein crystal structures are being determined at atomic resolution (1.2 Å or better). With atomic resolution data, refinement can be performed that requires much less strict compliance with prior knowledge of the expected geometry. Although some restraints must still be imposed, especially to deal with more flexible regions, and hence biases remain, it might be expected that these structures provide more precise information on the `true' geometrical and stereochemical properties of proteins. Ultimately, one would want to re-derive these properties using only atomic resolution structures, but their number is at present too limited to provide sufficient data for a meaningful statistical analysis.

In the meantime, atomic resolution protein structures have been used to check geometric and conformational parameters that have been derived from other sources, including small-molecule crystals and the larger set of proteins determined at various levels of resolution (Longhi et al., 1997 ; EU 3-D Validation Network, 1998 ). The EU 3-D Validation Network study showed that in the atomic resolution structures, most of the geometrical validation parameters are more tightly clustered about their mean value than in structures determined at lower resolution, including tighter clustering in the core regions of the Ramachandran plot, tighter clustering of the atomic volumes and smaller s.u.'s in the distributions of the $[\chi_{1}]$ , $[\chi_{2}]$ dihedral angles. In contrast, the ω torsion angle about the peptide bond exhibits a wider distribution, with a mean of 179.0 (56)° compared to 179.6 (47)° previously computed in protein structures determined at various resolution levels (Morris et al., 1992 ).

Recently, atomic resolution structures have also been used to derive atomic s.u values for proteins. Remarkably, the estimated coordinate errors for concanavalin A at 0.94 Å (Deacon et al., 1997 ) were found to be equivalent to those of small-molecule crystal structures, despite the large size of the protein (237 residues).

Atomic resolution structures of proteins and other macromolecules thus promise to represent a valuable source of accurate information on geometric and conformational parameters of these molecules. But the analysis and validation of such structures also brings about additional complications, such as, for example, the problem of dealing with equilibria between multiple conformations, which atomic resolution data tend to resolve with much higher detail and accuracy. Handling these equilibria will require an adaptation of the current validation procedures.

21.2.5. Concluding remarks

| top | pdf |

The coming years will see an ever-increasing number of crystal structures of proteins and nucleic acids determined at high resolution and a substantial growth in the number of atomic resolution structures. This will most certainly help in obtaining better data on the geometric and stereochemical parameters of these macromolecules and thus improve the target values for both refinement and structure validation. It should also make it possible to derive better criteria for evaluating the agreement of the model with the electron density and to improve upon and generalize comprehensive and systematic approaches, such as that implemented in the software SFCHECK.

References

Abagyan, R. A. & Totrov, M. M. (1997). Contact area difference (CAD): a robust measure to evaluate accuracy of protein models. J. Mol. Biol. 268, 678–685.Google Scholar

Agarwal, R. C. (1978). A new least-squares refinement technique based on the fast Fourier transform algorithm. Acta Cryst. A34, 791–809.Google Scholar

Alard, P. (1991). Calcul de surface et d'énergie dans le domaine des macromolécules. PhD thesis dissertation, Université Libre de Bruxelles, Belgium.Google Scholar

Allen, F. H., Bellard, S., Brice, M. D., Cartwright, B. A., Doubleday, A., Higgs, H., Hummelink, T., Hummelink-Peters, B. G., Kennard, O., Motherwell, W. D. S., Rodgers, J. R. & Watson, D. G. (1979). The Cambridge Crystallographic Data Centre: computer-based search, retrieval, analysis and display of information. Acta Cryst. B35, 2331–2339.Google Scholar

Allen, F. H., Kennard, O. & Taylor, R. (1983). Systematic analysis of structural data as a research technique in organic chemistry. Acc. Chem. Res. 16, 146–153.Google Scholar

Berman, H. M., Olson, W. K., Beveridge, D. L., Westbrook, J., Gelbin, A., Demeny, T., Hsieh, S.-H., Srinivasan, A. R. & Schneider, B. (1992). The Nucleic Acid Database. A comprehensive relational database of three-dimensional structures of nucleic acids. Biophys. J. 63, 751–759.Google Scholar

Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N. & Bourne, P. E. (2000). The Protein Data Bank. Nucleic Acids Res. 28, 235–242.Google Scholar

Bernstein, F. C., Koetzle, T. F., Williams, G. J. B., Meyer, E. F. Jr, Brice, M. D., Rodgers, J. R., Kennard, O., Shimanouchi, T. & Tasumi, M. (1977). The Protein Data Bank: a computer-based archival file for macromolecular structures. J. Mol. Biol. 112, 535–542.Google Scholar

Bourne, P. E., Berman, H. M., McMahon, B., Watenpaugh, K., Westbrook, J. & Fitzgerald, P. M. D. (1997). Macromolecular Crystallographic Information File. Methods Enzymol. 277, 571–590.Google Scholar

Brünger, A. T. (1992a). X-PLOR manual. Version 3.1. New Haven: Yale University Press.Google Scholar

Brünger, A. T. (1992b). Free R value: a novel statistical quantity for assessing the accuracy of crystal structures. Nature (London), 355, 472–474.Google Scholar

Brünger, A. T. (1997). Free R value: cross-validation in crystallography. Methods Enzymol. 277, 366–396.Google Scholar

Chapman, M. S. (1995). Restrained real-space macromolecular atomic refinement using a new resolution-dependent electron-density function. Acta Cryst. A51, 69–80.Google Scholar

Clowney, L., Jain, S. C., Srinivasan, A. R., Westbrook, J., Olson, W. K. & Berman, H. M. (1996). Geometric parameters in nucleic acids: nitrogenous bases. J. Am. Chem. Soc. 118, 509–518.Google Scholar

Clowney, L., Westbrook, J. D. & Berman, H. M. (1999). CIF applications. XI. A La Mode: a ligand and monomer object data environment. I. Automated construction of mmCIF monomer and ligand models. J. Appl. Cryst. 32, 125–133.Google Scholar

Collaborative Computational Project, Number 4 (1994). The CCP4 suite: programs for protein crystallography. Acta Cryst. D50, 760–763.Google Scholar

Cruickshank, D. W. J. (1949). The accuracy of electron-density maps in X-ray analysis with special reference to dibenzyl. Acta Cryst. 2, 65–82.Google Scholar

Cruickshank, D. W. J. (1965). Errors in least-squares methods. In Computing methods in crystallography, edited by J. S. Rollett, pp. 112–116. Oxford: Pergamon Press.Google Scholar

Cruickshank, D. W. J. (1996). Protein precision re-examined: Luzzati plots do not estimate final errors. In Proceedings of the CCP4 study weekend. Macromolecular refinement, edited by E. Dodson, M. Moore, A. Ralph & S. Bailey, pp. 11–22. Warrington: Daresbury Laboratory.Google Scholar

Das, U., Chen, S., Fuxreiter, M., Vaguine, A. A., Richelle, J., Berman, H. M. & Wodak, S. J. (2001). Checking nucleic acid crystal structures. Acta Cryst. D57, 813–828.Google Scholar

Deacon, A., Gleichmann, T., Kalb (Gilboa), A. J., Price, H., Raftery, J., Bradbrook, G., Yariv, J. & Helliwell, J. R. (1997). The structure of concanavalin A and its bound solvent determined with small-molecule accuracy at 0.94 Å resolution. J. Chem. Soc. Faraday Trans. 93, 4305–4312.Google Scholar

Dodson, E., Kleywegt, G. J. & Wilson, K. (1996). Report of a workshop on the use of statistical validators in protein X-ray crystallography. Acta Cryst. D52, 228–234.Google Scholar

Eisenberg, D., Luthy, R. & Bowie, J. U. (1997). VERIFY3D: assessment of protein models with three-dimensional profiles. Methods Enzymol. 277, 396–404.Google Scholar

Engh, R. A. & Huber, R. (1991). Accurate bond and angle parameters for X-ray protein structure refinement. Acta Cryst. A47, 392–400.Google Scholar

EU 3-D Validation Network (1998). Who checks the checkers? Four validation tools applied to eight atomic resolution structures. J. Mol. Biol. 276, 417–436.Google Scholar

Finney, J. L. (1975). Volume occupation, environment and accessibility in proteins. The problem of the protein surface. J. Mol. Biol. 96, 721–732.Google Scholar

Gelbin, A., Schneider, B., Clowney, L., Hsieh, S.-H., Olson, W. K. & Berman, H. M. (1996). Geometric parameters in nucleic acids: sugar and phosphate constituents. J. Am. Chem. Soc. 118, 519–529.Google Scholar

Gellatly, B. J. & Finney, J. L. (1982). Calculation of protein volumes: an alternative to the Voronoi procedure. J. Mol. Biol. 161, 305–322.Google Scholar

Harata, K., Abe, Y. & Muraki, M. (1998). Full-matrix least-squares refinement of lysozymes and analysis of anisotropic thermal motion. Proteins, 30, 232–243.Google Scholar

Harpaz, Y., Gerstein, M. & Chothia, C. (1994). Volume changes on protein folding. Structure, 2, 611–649.Google Scholar

Hooft, R. W. W., Sander, C. & Vriend, G. (1996). Positioning hydrogen atoms by optimizing hydrogen-bond networks in protein structures. Proteins, 26, 363–376.Google Scholar

Hooft, R. W. W., Sander, C. & Vriend, G. (1997). Objectively judging the quality of a protein structure from a Ramachandran plot. Comput. Appl. Biosci. 13, 425–430.Google Scholar

Hooft, R. W. W., Vriend, G., Sander, C. & Abola, E. E. (1996). Errors in protein structures. Nature (London), 381, 272.Google Scholar

Jones, T. A., Zou, J.-Y., Cowan, S. W., Kjeldgaard, M. (1991). Improved methods for building protein models in electron density maps and the location of errors in these models. Acta Cryst. A47, 110–119.Google Scholar

Kabsch, W. & Sander, C. (1983). Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 22, 2577–2637.Google Scholar

Kleywegt, G. J. & Brünger, A. T. (1996). Checking your imagination: applications of the free R value. Structure, 4, 897–904.Google Scholar

Kleywegt, G. J. & Jones, T. A. (1996). Phi/psi-chology: Ramachandran revisited. Structure, 4, 1395–1400.Google Scholar

Kleywegt, G. J. & Jones, T. A. (1998). Databases in protein crystallography. Acta Cryst. D54, 1119–1131.Google Scholar

Laskowski, R. A., MacArthur, M. W., Moss, D. S. & Thornton, J. M. (1993). PROCHECK: a program to check the stereochemical quality of protein structures. J. Appl. Cryst. 26, 283–291.Google Scholar

Laskowski, R. A., MacArthur, M. W. & Thornton, J. M. (1998). Validation of protein models derived from experiment. Curr. Opin. Struct. Biol. 8, 631–639.Google Scholar

Longhi, S., Czjzek, M., Lamzin, V., Nicolas, A. & Cambillau, C. (1997). Atomic resolution (1.0 Å) crystal structure of Fusarium solani cutinase: stereochemical analysis. J. Mol. Biol. 268, 779–799.Google Scholar

MacArthur, M. W., Laskowski, R. A. & Thornton, J. M. (1994). Knowledge-based validation of protein-structure coordinates derived by X-ray crystallography and NMR spectroscopy. Curr. Opin. Struct. Biol. 4, 731–737.Google Scholar

MacKerell, A. D. Jr, Bashford, D., Bellott, M., Dunbrack, R. L. Jr, Evanseck, J. D., Field, M. J., Fischer, S., Gao, J., Guo, H., Ha, S., Joseph-McCarthy, D., Kuchnir, L., Kuczera, K., Lau, F. T. K., Mattos, C., Michnick, S., Ngo, T., Nguyen, D. T., Prodhom, B., Reiher, W. E. III, Roux, B., Schlenkrich, M., Smith, J. C., Stote, R., Straub, J., Watanabe, M., Wiórkiewicz-Kuczera, J., Yin, D. & Karplus, M. (1998). All-atom empirical potential for molecular modeling and dynamics studies of proteins. J. Phys. Chem. B, 102, 3586–3616.Google Scholar

Melo, F. & Feytmans, E. (1997). Novel knowledge-based mean force potential at atomic level. J. Mol. Biol. 267, 207–222.Google Scholar

Melo, F. & Feytmans, E. (1998). Assessing protein structures with a non-local atomic interaction energy. J. Mol. Biol. 277, 1141–1152.Google Scholar

Morris, A. L., MacArthur, M. W., Hutchinson, E. G. & Thornton, J. M. (1992). Stereochemical quality of protein structure coordinates. Proteins Struct. Funct. Genet. 12, 345–364.Google Scholar

Murshudov, G. N., Davies, G. J., Isupov, M., Krzywda, S. & Dodson, E. J. (1998). The effect of overall anisotropic scaling in macromolecular refinement. Newsletter on protein crystallography, pp. 37–42. Warrington: Daresbury Laboratory.Google Scholar

Murshudov, G. N., Vagin, A. A. & Dodson, E. J. (1997). Refinement of macromolecular structures by the maximum-likelihood method. Acta Cryst. D53, 240–255.Google Scholar

Parkinson, G., Vojtechovsky, J., Clowney, L., Brünger, A. T. & Berman, H. M. (1996). New parameters for the refinement of nucleic acid-containing structures. Acta Cryst. D52, 57–64.Google Scholar

Pontius, J., Richelle, J. & Wodak, S. J. (1996). Deviations from standard atomic volumes as a quality measure for protein crystal structures. J. Mol. Biol. 264, 121–136.Google Scholar

Ratnaparkhi, G. S., Ramachandran, S., Udgaonkar, J. B. & Varadarajan, R. (1998). Discrepancies between the NMR and X-ray structures of uncomplexed barstar: analysis suggests that packing densities of protein structures determined by NMR are unreliable. Biochemistry, 37, 6958–6966.Google Scholar

Richards, F. M. (1974). The interpretation of protein structures: total volume, group volume distributions and packing density. J. Mol. Biol. 82, 1–4.Google Scholar

Richards, F. M. (1985). Calculation of molecular volumes and areas for structures of known geometry. Methods Enzymol. 115, 440–464.Google Scholar

Sayle, R. A. & Milner-White, E. J. (1995). RASMOL: biomolecular graphics for all. Trends Biochem. Sci. 20, 374–376.Google Scholar

Schneider, B., Neidle, S. & Berman, H. M. (1997). Conformations of the sugar–phosphate backbone in helical DNA crystal structures. Biopolymers, 42, 113–124.Google Scholar

Sheldrick, G. M. & Schneider, T. R. (1997). SHELXL: high-resolution refinement. Methods Enzymol. 277, 319–343.Google Scholar

Sheriff, S. & Hendrickson, W. A. (1987). Description of overall anisotropy in diffraction from macromolecular crystals. Acta Cryst. A43, 118–121.Google Scholar

Sippl, M. J. (1990). Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins. J. Mol. Biol. 213, 859–883.Google Scholar

Sippl, M. J. (1993). Recognition of errors in three-dimensional structures of proteins. Proteins, 17, 355–362.Google Scholar

Tickle, I. J., Laskowski, R. A. & Moss, D. S. (1998). Error estimates of protein structure coordinates and deviations from standard geometry by full-matrix refinement of γB- and βB2-crystallin. Acta Cryst. D54, 243–252.Google Scholar

Vaguine, A. A., Richelle, J. & Wodak, S. J. (1999). SFCHECK: a unified set of procedures for evaluating the quality of macromolecular structure-factor data and their agreement with the atomic model. Acta Cryst. D55, 191–205.Google Scholar

Voronoi, G. F. (1908). Nouvelles applications des paramètres continus à la théorie des formes quadratiques. J. Reine Angew. Math. 134, 198–287.Google Scholar

Walter, R. L., Ealick, S. E., Friedman, A. M., Blake, R. C. II, Proctor, P. & Shoham, M. (1996). Multiple wavelength anomalous diffraction (MAD) crystal structure of rusticyanin: a highly oxidizing cupredoxin with extreme acid stability. J. Mol. Biol. 263, 730–749.Google Scholar

Wodak, S. J. & Rooman, M. (1993). Generating and testing protein folds. Curr. Opin. Struct. Biol. 3, 247–259.Google Scholar

Zhou, G., Wang, J., Blanc, E. & Chapman, M. S. (1998). Determination of the relative precision of atoms in macromolecular structure. Acta Cryst. D54, 391–399.Google Scholar

International Tables for Crystallography (2006). Vol. F. ch. 21.2, pp. 507-519
https://doi.org/10.1107/97809553602060000708