Deviations from standard atomic volumes as a quality measure for protein crystal structures

Wodak, S. J.; Vagin, A. A.; Richelle, J.; Das, U.; Pontius, J.; Berman, H. M.

doi:10.1107/97809553602060000708

International
Tables for
Crystallography
Volume F
Crystallography of biological macromolecules
Edited by M. G. Rossmann and E. Arnold

pdf | chapter contents | chapter index | related articles

International Tables for Crystallography (2006). Vol. F. ch. 21.2, p. 509 | 1 | 2 |

Section 21.2.2.2.3. Deviations from standard atomic volumes as a quality measure for protein crystal structures

S. J. Wodak,^a ^* A. A. Vagin,^b J. Richelle,^b U. Das,^b J. Pontius^b and H. M. Berman^c

^aUnité de Conformation de Macromolécules Biologiques, Université Libre de Bruxelles, avenue F. D. Roosevelt 50, CP160/16, B-1050 Bruxelles, Belgium, and EMBL–EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, England, ^bUnité de Conformation de Macromolécules Biologiques, Université Libre de Bruxelles, avenue F. D. Roosevelt 50, CP160/16, B-1050 Bruxelles, Belgium, and ^cDepartment of Chemistry, Rutgers University, 610 Taylor Road, Piscataway, NJ 08854-8087, USA
Correspondence e-mail: shosh@ucmb.ulb.ac.be

21.2.2.2.3. Deviations from standard atomic volumes as a quality measure for protein crystal structures

| top | pdf |

The observations that protein X-ray structures are at least as tightly packed as small-molecule crystals (Richards, 1974; Harpaz et al., 1994) and that the packing density inside proteins displays very limited variation (Richards, 1974; Finney, 1975) suggest that atomic volumes or measures of atomic packing can be added to the list of parameters for assessing the quality of protein structures.

Packing and related measures have been used to compare structures of proteins derived by both X-ray diffraction and NMR spectroscopy. Ratnaparkhi et al. (1998) analysed pairs of protein structures for which both crystal and NMR structures were available. They found that the packing values of the NMR models displayed a much larger scatter than those of the corresponding crystal structures, suggesting that this is probably due to the fact that accurate values of the packing density cannot, at present, be obtained from NMR data. Similar conclusions were reached using measures of residue–residue contact area (Abagyan & Totrov, 1997).

Here, we describe the approach of Pontius et al. (1996), in which deviations from standard atomic volumes are used to assess the quality of a protein model, both overall and in specific regions.

The volumes occupied by atoms and residues inside proteins can be readily computed using the Voronoi method (Voronoi, 1908), first applied to proteins by Richards (1974) and Finney (1975). This method uses the atomic positions of the molecular model, and the volume assigned to each atom is defined as the smallest polyhedron created by the set of planes bisecting the lines joining the atom centre to those of its neighbours (Fig. 21.2.2.1).

Figure 21.2.2.1| top | pdf |

The Voronoi polyhedron. (a) Positioning of the dividing plane P between two atoms i and j, with van der Waals radii [r_i] and [r_j] , respectively, separated by a distance d. The plane P is positioned at d/2. (b) 2D representation of the Voronoi polyhedron of the central atom. This polyhedron is the smallest polyhedron delimited by all the dividing planes of the atom.

The use of the classical Voronoi procedure is justified in the context of validation because it avoids the need to derive a consistent set of van der Waals radii for atoms in the system. Such sets are used by other volume-calculation methods in order to partition space more accurately (Richards, 1974, 1985; Gellatly & Finney, 1982). Assigning a consistent set of radii to protein atoms is, indeed, not straightforward due to the heterogeneity of the interactions within the protein (polar, ionic, non-polar) and the presence of a large variety of hetero groups.

Structure-quality assessment based on volume calculations involves computing the atomic volumes in a subset of highly resolved and refined protein structures and analysing the distributions of these volumes for different atomic types, defined according to their chemical nature and bonded environment. These distributions define the expected ranges (mean and standard deviation) for the volume of each category of atoms. Atomic volumes in a given structure are then compared to the expected ranges, and statistically significant deviations from these ranges are flagged.

The program PROVE (Pontius et al., 1996) implements such an approach using the analytic algorithms for volume and surface-area calculations encoded in SurVol (Alard, 1991). It computes for each atom i in a structure its volume Z score $[(Z\; \hbox{score} = \big|V_{i}^{k} - \overline{V^{k}}\big|\big/\sigma^{k})]$ , where the superscript k designates the particular atom type (e.g., the C^α atom in a Leu residue), and $[\overline{V^{k}}]$ and $[\sigma^{k}]$ are, respectively, the mean and standard deviation of the reference volume distribution for the corresponding atom type. These reference distributions are derived from a set of high-quality protein crystal structures using exactly the same calculation procedure (Pontius et al., 1996).

Atoms with absolute Z scores > 3 are flagged as possible problem regions in the protein model, and residues containing such atoms are highlighted on graphical plots of the same type as those used by the PROCHECK program and on molecular models displayed using programs such as Rasmol (Sayle & Milner-White, 1995).

In addition to the validation of the local quality of the model, its overall quality can be assessed by the root-mean-square volume Z score of all its atoms (see Fig. 21.2.2.2 for definition). As for many stereochemical global quality indicators, this Z score shows good correlation with the nominal resolution (d spacing) of the crystallographic data, as illustrated in Fig. 21.2.2.2(a). This figure also shows that Z-score ranges can be defined for each resolution interval. The Z scores of individual proteins that lie outside these intervals may be indicative of `problem' structures. This is clearly the case for the two proteins 2ABX and 2GN5, whose Z scores are much higher than average (Fig. 21.2.2.2b).

Figure 21.2.2.2| top | pdf |

Atomic volume Z score r.m.s. variation with nominal resolution (d spacing) in 900 protein structures from the PDB. (a) Average of the r.m.s. volume Z score computed for structures having the same resolution (to within $[\pm 0.1]$ Å). The vertical bars indicate the magnitude of the standard deviations of the r.m.s. volume Z score in individual d-spacing bins. Graph points are derived from less than 10 structures (open diamonds) and from more than 10 structures (filled diamonds). (b) R.m.s. Z-score values as in (a), displayed for individual structures as a function of resolution. The five furthest outlier proteins are marked by their PDB codes.

Since the Voronoi volume of solvent-accessible atoms cannot be defined, because these atoms are not completely surrounded by other atoms, only completely buried atoms are scored.

The current version of PROVE is unable to measure the deviations from standard volumes for atoms in nucleic acids or hetero groups, simply because of the lack of reference volumes for these structures. This should change in the near future, at least for nucleic acids, thanks to the growing number of high-quality nucleic acid crystal structures from which standard volume ranges could be readily derived.

References

Abagyan, R. A. & Totrov, M. M. (1997). Contact area difference (CAD): a robust measure to evaluate accuracy of protein models. J. Mol. Biol. 268, 678–685.Google Scholar

Alard, P. (1991). Calcul de surface et d'énergie dans le domaine des macromolécules. PhD thesis dissertation, Université Libre de Bruxelles, Belgium.Google Scholar

Finney, J. L. (1975). Volume occupation, environment and accessibility in proteins. The problem of the protein surface. J. Mol. Biol. 96, 721–732.Google Scholar

Gellatly, B. J. & Finney, J. L. (1982). Calculation of protein volumes: an alternative to the Voronoi procedure. J. Mol. Biol. 161, 305–322.Google Scholar

Harpaz, Y., Gerstein, M. & Chothia, C. (1994). Volume changes on protein folding. Structure, 2, 611–649.Google Scholar

Pontius, J., Richelle, J. & Wodak, S. J. (1996). Deviations from standard atomic volumes as a quality measure for protein crystal structures. J. Mol. Biol. 264, 121–136.Google Scholar

Ratnaparkhi, G. S., Ramachandran, S., Udgaonkar, J. B. & Varadarajan, R. (1998). Discrepancies between the NMR and X-ray structures of uncomplexed barstar: analysis suggests that packing densities of protein structures determined by NMR are unreliable. Biochemistry, 37, 6958–6966.Google Scholar

Richards, F. M. (1974). The interpretation of protein structures: total volume, group volume distributions and packing density. J. Mol. Biol. 82, 1–4.Google Scholar

Richards, F. M. (1985). Calculation of molecular volumes and areas for structures of known geometry. Methods Enzymol. 115, 440–464.Google Scholar

Sayle, R. A. & Milner-White, E. J. (1995). RASMOL: biomolecular graphics for all. Trends Biochem. Sci. 20, 374–376.Google Scholar

Voronoi, G. F. (1908). Nouvelles applications des paramètres continus à la théorie des formes quadratiques. J. Reine Angew. Math. 134, 198–287.Google Scholar

International Tables for Crystallography (2006). Vol. F. ch. 21.2, p. 509