International
Tables for Crystallography Volume F Crystallography of biological macromolecules Edited by M. G. Rossmann and E. Arnold © International Union of Crystallography 2006 |
International Tables for Crystallography (2006). Vol. F. ch. 21.2, p. 509
Section 21.2.2.2.3. Deviations from standard atomic volumes as a quality measure for protein crystal structures
aUnité de Conformation de Macromolécules Biologiques, Université Libre de Bruxelles, avenue F. D. Roosevelt 50, CP160/16, B-1050 Bruxelles, Belgium, and EMBL–EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, England, bUnité de Conformation de Macromolécules Biologiques, Université Libre de Bruxelles, avenue F. D. Roosevelt 50, CP160/16, B-1050 Bruxelles, Belgium, and cDepartment of Chemistry, Rutgers University, 610 Taylor Road, Piscataway, NJ 08854-8087, USA |
21.2.2.2.3. Deviations from standard atomic volumes as a quality measure for protein crystal structures
The observations that protein X-ray structures are at least as tightly packed as small-molecule crystals (Richards, 1974; Harpaz et al., 1994) and that the packing density inside proteins displays very limited variation (Richards, 1974; Finney, 1975) suggest that atomic volumes or measures of atomic packing can be added to the list of parameters for assessing the quality of protein structures.
Packing and related measures have been used to compare structures of proteins derived by both X-ray diffraction and NMR spectroscopy. Ratnaparkhi et al. (1998) analysed pairs of protein structures for which both crystal and NMR structures were available. They found that the packing values of the NMR models displayed a much larger scatter than those of the corresponding crystal structures, suggesting that this is probably due to the fact that accurate values of the packing density cannot, at present, be obtained from NMR data. Similar conclusions were reached using measures of residue–residue contact area (Abagyan & Totrov, 1997).
Here, we describe the approach of Pontius et al. (1996), in which deviations from standard atomic volumes are used to assess the quality of a protein model, both overall and in specific regions.
The volumes occupied by atoms and residues inside proteins can be readily computed using the Voronoi method (Voronoi, 1908), first applied to proteins by Richards (1974) and Finney (1975). This method uses the atomic positions of the molecular model, and the volume assigned to each atom is defined as the smallest polyhedron created by the set of planes bisecting the lines joining the atom centre to those of its neighbours (Fig. 21.2.2.1).
The use of the classical Voronoi procedure is justified in the context of validation because it avoids the need to derive a consistent set of van der Waals radii for atoms in the system. Such sets are used by other volume-calculation methods in order to partition space more accurately (Richards, 1974, 1985; Gellatly & Finney, 1982). Assigning a consistent set of radii to protein atoms is, indeed, not straightforward due to the heterogeneity of the interactions within the protein (polar, ionic, non-polar) and the presence of a large variety of hetero groups.
Structure-quality assessment based on volume calculations involves computing the atomic volumes in a subset of highly resolved and refined protein structures and analysing the distributions of these volumes for different atomic types, defined according to their chemical nature and bonded environment. These distributions define the expected ranges (mean and standard deviation) for the volume of each category of atoms. Atomic volumes in a given structure are then compared to the expected ranges, and statistically significant deviations from these ranges are flagged.
The program PROVE (Pontius et al., 1996) implements such an approach using the analytic algorithms for volume and surface-area calculations encoded in SurVol (Alard, 1991). It computes for each atom i in a structure its volume Z score , where the superscript k designates the particular atom type (e.g., the Cα atom in a Leu residue), and and are, respectively, the mean and standard deviation of the reference volume distribution for the corresponding atom type. These reference distributions are derived from a set of high-quality protein crystal structures using exactly the same calculation procedure (Pontius et al., 1996).
Atoms with absolute Z scores > 3 are flagged as possible problem regions in the protein model, and residues containing such atoms are highlighted on graphical plots of the same type as those used by the PROCHECK program and on molecular models displayed using programs such as Rasmol (Sayle & Milner-White, 1995).
In addition to the validation of the local quality of the model, its overall quality can be assessed by the root-mean-square volume Z score of all its atoms (see Fig. 21.2.2.2 for definition). As for many stereochemical global quality indicators, this Z score shows good correlation with the nominal resolution (d spacing) of the crystallographic data, as illustrated in Fig. 21.2.2.2(a). This figure also shows that Z-score ranges can be defined for each resolution interval. The Z scores of individual proteins that lie outside these intervals may be indicative of `problem' structures. This is clearly the case for the two proteins 2ABX and 2GN5, whose Z scores are much higher than average (Fig. 21.2.2.2b).
Since the Voronoi volume of solvent-accessible atoms cannot be defined, because these atoms are not completely surrounded by other atoms, only completely buried atoms are scored.
The current version of PROVE is unable to measure the deviations from standard volumes for atoms in nucleic acids or hetero groups, simply because of the lack of reference volumes for these structures. This should change in the near future, at least for nucleic acids, thanks to the growing number of high-quality nucleic acid crystal structures from which standard volume ranges could be readily derived.
References
Abagyan, R. A. & Totrov, M. M. (1997). Contact area difference (CAD): a robust measure to evaluate accuracy of protein models. J. Mol. Biol. 268, 678–685.Google ScholarAlard, P. (1991). Calcul de surface et d'énergie dans le domaine des macromolécules. PhD thesis dissertation, Université Libre de Bruxelles, Belgium.Google Scholar
Finney, J. L. (1975). Volume occupation, environment and accessibility in proteins. The problem of the protein surface. J. Mol. Biol. 96, 721–732.Google Scholar
Gellatly, B. J. & Finney, J. L. (1982). Calculation of protein volumes: an alternative to the Voronoi procedure. J. Mol. Biol. 161, 305–322.Google Scholar
Harpaz, Y., Gerstein, M. & Chothia, C. (1994). Volume changes on protein folding. Structure, 2, 611–649.Google Scholar
Pontius, J., Richelle, J. & Wodak, S. J. (1996). Deviations from standard atomic volumes as a quality measure for protein crystal structures. J. Mol. Biol. 264, 121–136.Google Scholar
Ratnaparkhi, G. S., Ramachandran, S., Udgaonkar, J. B. & Varadarajan, R. (1998). Discrepancies between the NMR and X-ray structures of uncomplexed barstar: analysis suggests that packing densities of protein structures determined by NMR are unreliable. Biochemistry, 37, 6958–6966.Google Scholar
Richards, F. M. (1974). The interpretation of protein structures: total volume, group volume distributions and packing density. J. Mol. Biol. 82, 1–4.Google Scholar
Richards, F. M. (1985). Calculation of molecular volumes and areas for structures of known geometry. Methods Enzymol. 115, 440–464.Google Scholar
Sayle, R. A. & Milner-White, E. J. (1995). RASMOL: biomolecular graphics for all. Trends Biochem. Sci. 20, 374–376.Google Scholar
Voronoi, G. F. (1908). Nouvelles applications des paramètres continus à la théorie des formes quadratiques. J. Reine Angew. Math. 134, 198–287.Google Scholar