Definitions of protein volume

Gerstein, M.; Richards, F. M.

doi:10.1107/97809553602060000710

International
Tables for
Crystallography
Volume F
Crystallography of biological macromolecules
Edited by M. G. Rossmann and E. Arnold

pdf | chapter contents | chapter index | related articles

International Tables for Crystallography (2006). Vol. F. ch. 22.1, pp. 531-533 | 1 | 2 |

Section 22.1.1.2. Definitions of protein volume

M. Gerstein^a ^* and F. M. Richards^a

22.1.1.2. Definitions of protein volume

| top | pdf |

22.1.1.2.1. Volume in terms of Voronoi polyhedra: overview

| top | pdf |

Protein volume can be defined in a straightforward sense through a particular geometric construction called the Voronoi polyhedron. In essence, this construction provides a useful way of partitioning space amongst a collection of atoms. Each atom is surrounded by a single convex polyhedron and allocated the space within it (Fig. 22.1.1.1). The faces of Voronoi polyhedra are formed by constructing dividing planes perpendicular to vectors connecting atoms, and the edges of the polyhedra result from the intersection of these planes.

Figure 22.1.1.1 | top | pdf |

The Voronoi construction in two and three dimensions. Representative Voronoi polyhedra from 1CSE (subtilisin) are shown. (a) Six polyhedra around the atoms in a Phe ring. (b) A single polyhedron around the side-chain hydroxyl oxygen (OG) of a serine. (c) A schematic showing the construction of a Voronoi polyhedron in two dimensions. The broken lines indicate planes that were initially included in the polyhedron but then removed by the `chopping-down' procedure (see Fig. 22.1.1.4).

Voronoi polyhedra were originally developed by Voronoi (1908) nearly a century ago. Bernal & Finney (1967) used them to study the structure of liquids in the 1960s. However, despite the general utility of these polyhedra, their application to proteins was limited by a serious methodological difficulty. While the Voronoi construction is based on partitioning space amongst a collection of `equal' points, all protein atoms are not equal. Some are clearly larger than others. In 1974, a solution was found to this problem (Richards, 1974), and since then Voronoi polyhedra have been applied to proteins.

22.1.1.2.2. The basic Voronoi construction

| top | pdf |

22.1.1.2.2.1. Integrating on a grid

| top | pdf |

The simplest method for calculating volumes with Voronoi polyhedra is to put all atoms in the system on a fine grid. Then go to each grid point (i.e. voxel) and add its infinitesimal volume to the atom centre closest to it. This is prohibitively slow for a real protein structure, but it can be made somewhat faster by randomly sampling grid points. It is, furthermore, a useful approach for high-dimensional integration (Sibbald & Argos, 1990).

More realistic approaches to calculating Voronoi volumes have two parts: (1) for each atom find the vertices of the polyhedron around it and (2) systematically collect these vertices to draw the polyhedron and calculate its volume.

22.1.1.2.2.2. Finding polyhedron vertices

| top | pdf |

In the basic Voronoi construction (Fig. 22.1.1.1), each atom is surrounded by a unique limiting polyhedron such that all points within an atom's polyhedron are closer to this atom than all other atoms. Consequently, points equidistant from two atoms lie on a dividing plane; those equidistant from three atoms are on a line, and those equidistant from four atoms form a vertex. One can use this last fact to find all the vertices associated with an atom easily. With the coordinates of four atoms, it is straightforward to solve for possible vertex coordinates using the equation of a sphere. [That is, one uses four sets of coordinates (x, y, z) and the equation $[{(x - a)^{2} + (y - b)^{2} + (z - c)^{2} = r^{2}}]$ to solve for the centre (a, b, c) and radius (r) of the sphere.] One then checks whether this putative vertex is closer to these four atoms than any other atom; if so, it is a real vertex.

Note that this procedure can fail for certain pathological arrangements of atoms that would not normally be encountered in a real protein structure. These occur if there is a centre of symmetry, as in a regular cubic lattice or in a perfect hexagonal ring in a protein (see Procacci & Scateni, 1992). Centres of symmetry can be handled (in a limited way) by randomly perturbing the atoms a small amount and breaking the symmetry. Alternatively, the `chopping-down' method described below is not affected by symmetry centres – an important advantage to this method of calculation.

22.1.1.2.2.3. Collecting vertices and calculating volumes

| top | pdf |

To collect the vertices associated with an atom systematically, label each one by the indices of the four atoms with which it is associated (Fig. 22.1.1.2). To traverse the vertices on one face of a polyhedron, find all vertices that share two indices and thus have two atoms in common, e.g. a central atom (atom 0) and another atom (atom 1). Arbitrarily pick a vertex to start at and walk around the perimeter of the face. One can tell which vertices are connected by edges because they will have a third atom in common (in addition to atom 0 and atom 1). This sequential walking procedure also provides a way of drawing polyhedra on a graphics device. More importantly, with reference to the starting vertex, the face can be divided into triangles, for which it is trivial to calculate areas and volumes (see Fig. 22.1.1.2 for specifics).

Figure 22.1.1.2 | top | pdf |

Labelling parts of Voronoi polyhedra. The central atom is atom 0, and each neighbouring atom has a sequential index number (1, 2, 3…). Consequently, in three dimensions, planes are denoted by the indices of the two atoms that form them (e.g. 01); lines are denoted by the indices of three atoms (e.g. 012); and vertices are denoted by four indices (e.g. 0123). In the 2D representation shown here, lines are denoted by two indices, and vertices by three. From a collection of points, a volume can be calculated by a variety of approaches: First of all, the volume of a tetrahedron determined by four points can be calculated by placing one vertex at the origin and evaluating the determinant formed from the remaining three vertices. (The tetrahedron volume is one-sixth of the determinant value.) The determinant can be quickly calculated by a vector triple product, $[{\bf w} \cdot ({\bf u} \times {\bf v})]$ , where u, v and w are vectors between the vertex selected to be the origin and the other three vertices of the tetrahedron. Alternatively, the volume of the pyramid from a central atom to a face can be calculated from the usual formula Ad/3, where A is the area of the face and d is the distance to the face.

22.1.1.2.3. Adapting Voronoi polyhedra to proteins

| top | pdf |

In the procedure outlined above, all atoms are considered equal, and the dividing planes are positioned midway between atoms (Fig. 22.1.1.3). This method of partition, called bisection, is not physically reasonable for proteins, which have atoms of obviously different size (such as oxygen and sulfur). It chemically misallocates volume, giving excess to the smaller atom.

Figure 22.1.1.3 | top | pdf |

Positioning of the dividing plane. (a) The dividing plane is positioned at a distance d from the larger atom with respect to radii of the larger atom (R) and the smaller atom (r) and the total separation between the atoms (D). (b) Vertex error. One problem with using method B is that the calculation does not account for all space, and tiny tetrahedra of unallocated volume are created near the vertices of each polyhedron. Such an error tetrahedron is shown. The radical-plane method does not suffer from vertex error, but it is not as chemically reasonable as method B.

Two principal methods of repositioning the dividing plane have been proposed to make the partition more physically reasonable: method B (Richards, 1974) and the radical-plane method (Gellatly & Finney, 1982). Both methods depend on the radii of the atoms in contact (R for the larger atom and r for the smaller one) and the distance between the atoms (D). As shown in Fig. 22.1.1.3, they position the plane at a distance d from the larger atom. This distance is always set such that the plane is closer to the smaller atom.

22.1.1.2.3.1. Method B and a simplification of it: the ratio method

| top | pdf |

Method B is the more chemically reasonable of the two and will be emphasized here. For atoms that are covalently bonded, it divides the distance between the atoms proportionaly according to their covalent-bond radii: $[d = DR/(R + r). \eqno(22.1.1.1)]$ For atoms that are not covalently bonded, method B splits the remaining distance between them after subtracting their VDW radii: $[d = R + (D - R - r)/2. \eqno(22.1.1.2)]$

For separations that are not very different to the sum of the radii, the two formulae for method B give essentially the same result. Consequently, it is worthwhile to try a slight simplification of method B, which we call the `ratio method'. Instead of using equation (22.1.1.1) for bonded atoms and equation (22.1.1.2) for non-bonded ones, one can just use equation (22.1.1.2) in both cases with either VDW or covalent radii (Tsai et al., 2001). Doing this gives more consistent reference volumes (manifest in terms of smaller standard deviations about the mean).

22.1.1.2.3.2. Vertex error

| top | pdf |

If bisection is not used to position the dividing plane, it is much more complicated to find the vertices of the polyhedron, since a vertex is no longer equidistant from four atoms. Moreover, it is also necessary to have a reasonable scheme for `typing' atoms and assigning them radii.

More subtly, when using the plane positioning determined by method B, the allocation of space is no longer mathematically perfect, since the volume in a tiny tetrahedron near each polyhedron vertex is not allocated to any atom (Fig. 22.1.1.3). This is called vertex error. However, calculations on periodic systems have shown that, in practice, vertex error does not amount to more than 1 part in 500 (Gerstein et al., 1995).

22.1.1.2.3.3. `Chopping-down' method of finding vertices

| top | pdf |

Because of vertex error and the complexities in locating vertices, a different algorithm has to be used for volume calculation with method B. (It can also be used with bisection.) First, surround the central atom (for which a volume is being calculated) by a very large, arbitrarily positioned tetrahedron. This is initially the `current polyhedron'. Next, sort all neighbouring atoms by distance from the central atom and go through them from nearest to farthest. For each neighbour, position a plane perpendicular to the vector connecting it to the central atom according to the predefined proportion (i.e. from the method B formulae or bisection). Since a Voronoi polyhedron is always convex, if any vertices of the current polyhedron are on the other side of this plane to the central atom, they cannot be part of the final polyhedron and should be discarded. After this has been done, the current polyhedron is recomputed using the plane to `chop it down'. This process is shown schematically in Fig. 22.1.1.4. When it is finished, one has a list of vertices that can be traversed to calculate volumes, as in the basic Voronoi procedure.

Figure 22.1.1.4 | top | pdf |

The `chopping-down' method of polyhedra construction. This is necessary when using method B for plane positioning, since one can no longer solve for the position of vertices. One starts with a large tetrahedron around the central atom and then `chops it down' by removing vertices that are outside the plane formed by each neighbour. For instance, say vertex 0214 of the current polyhedron is outside the plane formed by neighbour 6. One needs to delete 0214 from the list of vertices and recompute the polyhedron using the new vertices formed from the intersection of the plane formed by neighbour 6 and the current polyhedron. Using the labelling conventions in Fig. 22.1.1.2, one finds that these new vertices are formed by the intersection of three lines (021, 024 and 014) with plane 06. Therefore one adds the new vertices 0216, 0246 and 0146 to the polyhedron. However, there is a snag: it is necessary to check whether any of the three lines are not also outside of the plane. To do this, when a vertex is deleted, all the lines forming it (e.g. 021, 024, 014) are pushed onto a secondary list. Then when another vertex is deleted, one checks whether any of its lines have already been deleted. If so, this line is not used to intersect with the new plane. This process is shown schematically in two dimensions. For the purposes of the calculations, it is useful to define a plane created by a vector v from the central atom to the neighbouring atom using a constant K so that for any point u on the plane $[{\bf u} \cdot {\bf v} = K]$ . If $[{\bf u} \cdot {\bf V} \gt K]$ , u is on the wrong side of the plane, otherwise it is on the right side. A vertex point w satisfies the equations of three planes: $[{\bf w} \cdot {\bf v}_{1} = K_{1}]$ , $[{\bf w} \cdot {\bf v}_{2} = K_{2}]$ and $[{\bf w} \cdot {\bf v}_{3} = K_{3}]$ . These three equations can be solved to give the components of w. For example, the x component is given by $[w_{x} = \pmatrix{K_{1} &v_{1y} &v_{1z}\cr K_{2} &v_{2y} &v_{2z}\cr K_{3} &v_{3y} &v_{3z}\cr} \Bigg/ \pmatrix{v_{1x} &v_{1y} &v_{1z}\cr v_{2x} &v_{2y} &v_{2z}\cr v_{3x} &v_{3y} &v_{3z}\cr}.]$

22.1.1.2.3.4. Radical-plane method

| top | pdf |

The radical-plane method does not suffer from vertex error. In this method, the plane is positioned according to $[d = (D^{2} + R^{2} - r^{2})/2D. \eqno(22.1.1.3)]$

22.1.1.2.4. Delaunay triangulation

| top | pdf |

Voronoi polyhedra are closely related (i.e. dual) to another useful geometric construction called the Delaunay triangulation. This consists of lines, perpendicular to Voronoi faces, connecting each pair of atoms that share a face (Fig. 22.1.1.5).

Figure 22.1.1.5 | top | pdf |

Delaunay triangulation and its relation to the Voronoi construction. (a) A standard schematic of the Voronoi construction. The atoms used to define the Voronoi planes around the central atom are circled. Lines connecting these atoms to the central one are part of the Delaunay triangulation, which is shown in (b). Note that atoms included in the triangulation cannot be selected strictly on the basis of a simple distance criterion relative to the central atom. The two circles about the central atoms illustrate this. Some atoms within the outer circle but outside the inner circle are included in the triangulation, but others are not. In the context of protein structure, Delaunay triangulation is useful in identifying true `packing contacts', in contrast to those contacts found purely by distance threshold. The broken lines in (a) indicate planes that were initially included in the polyhedron but then removed by the `chopping-down' procedure (see Fig. 22.1.1.4).

Delaunay triangulation is described here as a derivative of the Voronoi construction. However, it can be constructed directly from the atom coordinates. In two dimensions, one connects with a triangle any triplet of atoms if a circle through them does not enclose any additional atoms. Likewise, in three dimensions one connects four atoms with a tetrahedron if the sphere through them does not contain any further atoms. Notice how this construction is equivalent to the specification for Voronoi polyhedra and, in a sense, is simpler. One can immediately see the relationship between the triangulation and the Voronoi volume by noting that the volume is the distance between neighbours (as determined by the triangulation) weighted by the area of each polyhedral face. In practice, it is often easier in drawing to construct the triangles first and then build the Voronoi polyhedra from them.

Delaunay triangulation is useful in many `nearest-neighbour' problems in computational geometry, e.g. trying to find the neighbour of a query point or finding the largest empty circle in a collection of points (O'Rourke, 1994). Since this triangulation has the `fattest' possible triangles, it is the choice for procedures such as finite-element analysis.

In terms of protein structure, Delaunay triangulation is the natural way to determine packing neighbours, either in protein structure or molecular simulation (Singh et al., 1996; Tsai et al., 1996, 1997). Its advantage is that the definition of a neighbour does not depend on distance. The alpha shape is a further generalization of Delaunay triangulation that has proven useful in identifying ligand-binding sites (Edelsbrunner et al., 1996, 1995; Edelsbrunner & Mucke, 1994; Peters et al., 1996).

References

Bernal, J. D. & Finney, J. L. (1967). Random close-packed hard-sphere model II. Geometry of random packing of hard spheres. Discuss. Faraday Soc. 43, 62–69.Google Scholar

Edelsbrunner, H., Facello, M. & Liang, J. (1996). On the definition and construction of pockets in macromolecules, pp. 272–287. Singapore: World Scientific.Google Scholar

Edelsbrunner, H., Facello, M., Ping, F. & Jie, L. (1995). Measuring proteins and voids in proteins. Proc. 28th Hawaii Intl Conf. Sys. Sci. pp. 256–264.Google Scholar

Edelsbrunner, H. & Mucke, E. (1994). Three-dimensional alpha shapes. ACM Trans. Graphics, 13, 43–72.Google Scholar

Gellatly, B. J. & Finney, J. L. (1982). Calculation of protein volumes: an alternative to the Voronoi procedure. J. Mol. Biol. 161, 305–322.Google Scholar

Gerstein, M., Tsai, J. & Levitt, M. (1995). The volume of atoms on the protein surface: calculated from simulation, using Voronoi polyhedra. J. Mol. Biol. 249, 955–966.Google Scholar

O'Rourke, J. (1994). Computational geometry in C. Cambridge University Press.Google Scholar

Peters, K. P., Fauck, J. & Frommel, C. (1996). The automatic search for ligand binding sites in proteins of known three-dimensional structure using only geometric criteria. J. Mol. Biol. 256, 201–213.Google Scholar

Procacci, P. & Scateni, R. (1992). A general algorithm for computing Voronoi volumes: application to the hydrated crystal of myoglobin. Int. J. Quant. Chem. 42, 151–152.Google Scholar

Richards, F. M. (1974). The interpretation of protein structures: total volume, group volume distributions and packing density. J. Mol. Biol. 82, 1–14.Google Scholar

Sibbald, P. R. & Argos, P. (1990). Weighting aligned protein or nucleic acid sequences to correct for unequal representation. J. Mol. Biol. 216, 813–818.Google Scholar

Singh, R. K., Tropsha, A. & Vaisman, I. I. (1996). Delaunay tessellation of proteins: four body nearest-neighbor propensities of amino acid residues. J. Comput. Biol. 3, 213–222.Google Scholar

Tsai, J., Gerstein, M. & Levitt, M. (1996). Keeping the shape but changing the charges: a simulation study of urea and its isosteric analogues. J. Chem. Phys. 104, 9417–9430.Google Scholar

Tsai, J., Gerstein, M. & Levitt, M. (1997). Estimating the size of the minimal hydrophobic core. Protein Sci. 6, 2606–2616.Google Scholar

Tsai, J., Voss, N. & Gerstein, M. (2001). Voronoi calculations of protein volumes: sensitivity analysis and parameter database. Bioinformatics. In the press.Google Scholar

Voronoi, G. F. (1908). Nouvelles applications des paramétres continus à la théorie des formes quadratiques. J. Reine Angew. Math. 134, 198–287.Google Scholar

International Tables for Crystallography (2006). Vol. F. ch. 22.1, pp. 531-533