International
Tables for Crystallography Volume F Crystallography of biological macromolecules Edited by M. G. Rossmann and E. Arnold © International Union of Crystallography 2006 |
International Tables for Crystallography (2006). Vol. F. ch. 22.1, pp. 531-545
https://doi.org/10.1107/97809553602060000710 Chapter 22.1. Protein surfaces and volumes: measurement and usea Department of Molecular Biophysics & Biochemistry, 266 Whitney Avenue, Yale University, PO Box 208114, New Haven, CT 06520, USA,bDepartment of Chemistry & Institute of Molecular Biophysics, Florida State University, Tallahassee, FL 32306-4380, USA, and c1259 El Camino Real 184, Menlo Park, CA 94025, USA In the first part of this chapter, methods for the calculation of the volumes and areas of proteins are surveyed. The application of Voronoi polyhedra to the calculation of volumes is focused on, with a discussion of how this construction is made and various aspects of it, including the calculation of standard sets of protein volumes. Various measures for protein surface areas, including the accessible surface area, are also discussed. In the second part of the chapter, the calculation of molecular surface areas, the uses of surface-area calculations and the representation of molecular surfaces are discussed. Keywords: GRASP ; atomic radii; binding energies; complete rolling algorithm; connected rolling algorithm; Connolly dot surface; convex hull; Delaunay triangulation; extended atoms; Gauss–Bonnet theorem; hydration surface; hydrophobicity; marching-cube algorithm; molecular packing; molecular surface; molecular volumes; packing efficiency; probe radius; probe sphere; representation of surfaces; solvent-accessible surface; surface areas; surface-area calculation; surfaces; vertex error; Voronoi construction; Voronoi polyhedra; van der Waals radii; van der Waals surface. |
For geometric analysis, a protein consists of a set of points in three dimensions. This information corresponds to the actual data provided by the experiment, which are fundamentally of a geometric rather than chemical nature. That is, crystallography primarily tells one about the positions of atoms and perhaps an approximate atomic number, but not their charge or number of hydrogen bonds.
For the purposes of geometric calculation, each point has an assigned identification number and a position defined by three coordinates in a right-handed Cartesian system. (These coordinates will be based on the electron density for X-ray derived structures and on nuclear positions for those derived from neutron scattering. Each coordinate is usually assumed to have an accuracy between 0.5 and 1.0 Å.) Normally, only one additional characteristic is associated with each point: its size, usually measured by a van der Waals (VDW) radius. Furthermore, characteristics such as chemical nature and covalent connectivity, if needed, can be obtained from lookup tables keyed on the ID number.
Our model of a protein, thus, is the van der Waals envelope – the set of interlocking spheres drawn around each atomic centre. In brief, the geometric quantities of the model of particular concern in this section are its total surface area, total volume, the division of these totals among the amino-acid residues and individual atoms, and the description of the empty space (cavities) outside the van der Waals envelope. These values are then used in the analysis of protein structure and properties.
All the geometric properties of a protein (e.g. surfaces, volumes, distances etc.) are obviously interrelated. So the definition of one quantity, e.g. area, obviously impacts on how another, e.g. volume, can be consistently defined. Here, we will endeavour to present definitions for measuring protein volume, showing how they are related to various definitions of linear distance (VDW parameters) and surface. Further information related to macromolecular geometry, focusing on volumes, is available from http://www.molmovdb.org/geometry/ .
Protein volume can be defined in a straightforward sense through a particular geometric construction called the Voronoi polyhedron. In essence, this construction provides a useful way of partitioning space amongst a collection of atoms. Each atom is surrounded by a single convex polyhedron and allocated the space within it (Fig. 22.1.1.1). The faces of Voronoi polyhedra are formed by constructing dividing planes perpendicular to vectors connecting atoms, and the edges of the polyhedra result from the intersection of these planes.
Voronoi polyhedra were originally developed by Voronoi (1908) nearly a century ago. Bernal & Finney (1967) used them to study the structure of liquids in the 1960s. However, despite the general utility of these polyhedra, their application to proteins was limited by a serious methodological difficulty. While the Voronoi construction is based on partitioning space amongst a collection of `equal' points, all protein atoms are not equal. Some are clearly larger than others. In 1974, a solution was found to this problem (Richards, 1974), and since then Voronoi polyhedra have been applied to proteins.
The simplest method for calculating volumes with Voronoi polyhedra is to put all atoms in the system on a fine grid. Then go to each grid point (i.e. voxel) and add its infinitesimal volume to the atom centre closest to it. This is prohibitively slow for a real protein structure, but it can be made somewhat faster by randomly sampling grid points. It is, furthermore, a useful approach for high-dimensional integration (Sibbald & Argos, 1990).
More realistic approaches to calculating Voronoi volumes have two parts: (1) for each atom find the vertices of the polyhedron around it and (2) systematically collect these vertices to draw the polyhedron and calculate its volume.
In the basic Voronoi construction (Fig. 22.1.1.1), each atom is surrounded by a unique limiting polyhedron such that all points within an atom's polyhedron are closer to this atom than all other atoms. Consequently, points equidistant from two atoms lie on a dividing plane; those equidistant from three atoms are on a line, and those equidistant from four atoms form a vertex. One can use this last fact to find all the vertices associated with an atom easily. With the coordinates of four atoms, it is straightforward to solve for possible vertex coordinates using the equation of a sphere. [That is, one uses four sets of coordinates (x, y, z) and the equation to solve for the centre (a, b, c) and radius (r) of the sphere.] One then checks whether this putative vertex is closer to these four atoms than any other atom; if so, it is a real vertex.
Note that this procedure can fail for certain pathological arrangements of atoms that would not normally be encountered in a real protein structure. These occur if there is a centre of symmetry, as in a regular cubic lattice or in a perfect hexagonal ring in a protein (see Procacci & Scateni, 1992). Centres of symmetry can be handled (in a limited way) by randomly perturbing the atoms a small amount and breaking the symmetry. Alternatively, the `chopping-down' method described below is not affected by symmetry centres – an important advantage to this method of calculation.
To collect the vertices associated with an atom systematically, label each one by the indices of the four atoms with which it is associated (Fig. 22.1.1.2). To traverse the vertices on one face of a polyhedron, find all vertices that share two indices and thus have two atoms in common, e.g. a central atom (atom 0) and another atom (atom 1). Arbitrarily pick a vertex to start at and walk around the perimeter of the face. One can tell which vertices are connected by edges because they will have a third atom in common (in addition to atom 0 and atom 1). This sequential walking procedure also provides a way of drawing polyhedra on a graphics device. More importantly, with reference to the starting vertex, the face can be divided into triangles, for which it is trivial to calculate areas and volumes (see Fig. 22.1.1.2 for specifics).
In the procedure outlined above, all atoms are considered equal, and the dividing planes are positioned midway between atoms (Fig. 22.1.1.3). This method of partition, called bisection, is not physically reasonable for proteins, which have atoms of obviously different size (such as oxygen and sulfur). It chemically misallocates volume, giving excess to the smaller atom.
Two principal methods of repositioning the dividing plane have been proposed to make the partition more physically reasonable: method B (Richards, 1974) and the radical-plane method (Gellatly & Finney, 1982). Both methods depend on the radii of the atoms in contact (R for the larger atom and r for the smaller one) and the distance between the atoms (D). As shown in Fig. 22.1.1.3, they position the plane at a distance d from the larger atom. This distance is always set such that the plane is closer to the smaller atom.
Method B is the more chemically reasonable of the two and will be emphasized here. For atoms that are covalently bonded, it divides the distance between the atoms proportionaly according to their covalent-bond radii: For atoms that are not covalently bonded, method B splits the remaining distance between them after subtracting their VDW radii:
For separations that are not very different to the sum of the radii, the two formulae for method B give essentially the same result. Consequently, it is worthwhile to try a slight simplification of method B, which we call the `ratio method'. Instead of using equation (22.1.1.1) for bonded atoms and equation (22.1.1.2) for non-bonded ones, one can just use equation (22.1.1.2) in both cases with either VDW or covalent radii (Tsai et al., 2001). Doing this gives more consistent reference volumes (manifest in terms of smaller standard deviations about the mean).
If bisection is not used to position the dividing plane, it is much more complicated to find the vertices of the polyhedron, since a vertex is no longer equidistant from four atoms. Moreover, it is also necessary to have a reasonable scheme for `typing' atoms and assigning them radii.
More subtly, when using the plane positioning determined by method B, the allocation of space is no longer mathematically perfect, since the volume in a tiny tetrahedron near each polyhedron vertex is not allocated to any atom (Fig. 22.1.1.3). This is called vertex error. However, calculations on periodic systems have shown that, in practice, vertex error does not amount to more than 1 part in 500 (Gerstein et al., 1995).
Because of vertex error and the complexities in locating vertices, a different algorithm has to be used for volume calculation with method B. (It can also be used with bisection.) First, surround the central atom (for which a volume is being calculated) by a very large, arbitrarily positioned tetrahedron. This is initially the `current polyhedron'. Next, sort all neighbouring atoms by distance from the central atom and go through them from nearest to farthest. For each neighbour, position a plane perpendicular to the vector connecting it to the central atom according to the predefined proportion (i.e. from the method B formulae or bisection). Since a Voronoi polyhedron is always convex, if any vertices of the current polyhedron are on the other side of this plane to the central atom, they cannot be part of the final polyhedron and should be discarded. After this has been done, the current polyhedron is recomputed using the plane to `chop it down'. This process is shown schematically in Fig. 22.1.1.4. When it is finished, one has a list of vertices that can be traversed to calculate volumes, as in the basic Voronoi procedure.
Voronoi polyhedra are closely related (i.e. dual) to another useful geometric construction called the Delaunay triangulation. This consists of lines, perpendicular to Voronoi faces, connecting each pair of atoms that share a face (Fig. 22.1.1.5).
Delaunay triangulation is described here as a derivative of the Voronoi construction. However, it can be constructed directly from the atom coordinates. In two dimensions, one connects with a triangle any triplet of atoms if a circle through them does not enclose any additional atoms. Likewise, in three dimensions one connects four atoms with a tetrahedron if the sphere through them does not contain any further atoms. Notice how this construction is equivalent to the specification for Voronoi polyhedra and, in a sense, is simpler. One can immediately see the relationship between the triangulation and the Voronoi volume by noting that the volume is the distance between neighbours (as determined by the triangulation) weighted by the area of each polyhedral face. In practice, it is often easier in drawing to construct the triangles first and then build the Voronoi polyhedra from them.
Delaunay triangulation is useful in many `nearest-neighbour' problems in computational geometry, e.g. trying to find the neighbour of a query point or finding the largest empty circle in a collection of points (O'Rourke, 1994). Since this triangulation has the `fattest' possible triangles, it is the choice for procedures such as finite-element analysis.
In terms of protein structure, Delaunay triangulation is the natural way to determine packing neighbours, either in protein structure or molecular simulation (Singh et al., 1996; Tsai et al., 1996, 1997). Its advantage is that the definition of a neighbour does not depend on distance. The alpha shape is a further generalization of Delaunay triangulation that has proven useful in identifying ligand-binding sites (Edelsbrunner et al., 1996, 1995; Edelsbrunner & Mucke, 1994; Peters et al., 1996).
When one is carrying out the Voronoi procedure, if a particular atom does not have enough neighbours the `polyhedron' formed around it will not be closed, but rather will have an open, concave shape. As it is not often possible to place enough water molecules in an X-ray crystal structure to cover all the surface atoms, these `open polyhedra' occur frequently on the protein surface (Fig. 22.1.1.6). Furthermore, even when it is possible to define a closed polyhedron on the surface, it will often be distended and too large. This is the problem of the protein surface in relation to the Voronoi construction.
There are a number of practical techniques for dealing with this problem. First, one can use very high resolution protein crystal structures, which have many solvent atoms positioned (Gerstein & Chothia, 1996). Alternatively, one can make up the positions of missing solvent molecules. These can be placed either according to a regular grid-like arrangement or, more realistically, according to the results of molecular simulation (Finney et al., 1980; Gerstein et al., 1995; Richards, 1974).
More fundamentally, however, the `problem of the protein surface' indicates how closely linked the definitions of surface and volume are and how the definition of one, in a sense, defines the other. That is, the two-dimensional (2D) surface of an object can be defined as the boundary between two 3D volumes. More specifically, the polyhedral faces defining the Voronoi volume of a collection of atoms also define their surface. The surface of a protein consists of the union of (connected) polyhedra faces. Each face in this surface is shared by one solvent atom and one protein atom (Fig. 22.1.1.7).
Another somewhat related definition is the convex hull, the smallest convex polyhedron that encloses all the atom centres (Fig. 22.1.1.7). This is important in computer-graphics applications and as an intermediary in many geometric constructions related to proteins (Connolly, 1991; O'Rourke, 1994). The convex hull is a subset of the Delaunay triangulation of the surface atoms. It is quickly located by the following procedure (Connolly, 1991): Find the atom farthest from the molecular centre. Then choose two of its neighbours (as determined by the Delaunay triangulation) such that a plane through these three atoms has all the remaining atoms of the molecule on one side of it (the `plane test'). This is the first triangle in the convex hull. Then one can choose a fourth atom connected to at least two of the three in the triangle and repeat the plane test, and by iteratively repeating this procedure, one can `sweep' across the surface of the molecule and define the whole convex hull.
Other parts of the Delaunay triangulation can define additional surfaces. The part of the triangulation connecting the first layer of water molecules defines a surface, as does the part joining the second layer. The second layer of water molecules, in fact, has been suggested on physical grounds to be the natural boundary for a protein in solution (Gerstein & Lynden-Bell, 1993c). Protein surfaces defined in terms of the convex hull or water layers tend to be `smoother' than those based on Voronoi faces, omitting deep grooves and clefts (see Fig. 22.1.1.7).
In the absence of solvent molecules to define Voronoi polyhedra, one can define the protein surface in terms of the position of a hypothetical solvent, often called the probe sphere, that `rolls' around the surface (Richards, 1977) (Fig. 22.1.1.7). The surface of the probe is imagined to be maintained at a tangent to the van der Waals surface of the model.
Various algorithms are used to cause the probe to visit all possible points of contact with the model. The locus of either the centre of the probe or the tangent point to the model is recorded. Either through exact analytical functions or numerical approximations of adjustable accuracy, the algorithms provide an estimate of the area of the resulting surface. (See Section 22.1.2 for a more extensive discussion of the definition, calculation and use of areas.)
Depending on the probe size and whether its centre or point of tangency is used to define the surface, one arrives at a number of commonly used definitions, summarized in Table 22.1.1.2 and Fig. 22.1.1.7.
The area of the van der Waals surface will be calculated by the various area algorithms (see Section 22.1.2.2) when the probe radius is set to zero. This is a mathematical calculation only. There is no physical procedure that will measure van der Waals surface area directly. From a mathematical point of view, it is just the first of a set of solvent-accessible surfaces calculated with differing probe radii.
The solvent-accessible surface is convex and closed, with defined areas assignable to each individual atom (Lee & Richards, 1971). However, the individual calculated values vary in a complex fashion with variations in the radii of the probe and protein atoms. This radius is frequently, but not always, set at a value considered to represent a water molecule (1.4 Å). The total SAS area increases without bound as the size of the probe increases.
Like the solvent-accessible surface, the molecular surface is also closed, but it contains a mixture of convex and concave patches, the sum of the contact and re-entrant surfaces. The ratio of these two surfaces varies with probe radius. In the limit of infinite probe radius, the molecular surface becomes convex and attains a limiting minimum value (i.e. it becomes a convex hull, similar to the one described above). The molecular surface cannot be divided up and assigned unambiguously to individual atoms.
The contact surface is not closed. Instead, it is a series of convex patches on individual atoms, simply related to the solvent-accessible surface of the same atoms. In complementary fashion, the re-entrant surface is also not closed but is a series of concave patches that is part of the probe surface where it contacts two or three atoms simultaneously. At infinite probe radius, the re-entrant areas are plane surfaces, at which point the molecular surface becomes a convex surface. The re-entrant surface cannot be divided up and assigned unambiguously to individual atoms. Note that the molecular surface is simply the union of the contact and re-entrant surfaces, so in terms of area MS = CS + RS.
The detail provided by these surfaces will depend on the radius of the probe used for their construction.
One may argue that the behaviour of the rolling probe sphere does not accurately model real hydrogen-bonded water. Instead, its `rolling' more closely mimics the behaviour of a nonpolar solvent. An attempt has been made to incorporate more realistic hydrogen-bonding behavior into the probe sphere, allowing for the definition of a hydration surface more closely linked to the behaviour of real water (Gerstein & Lynden-Bell, 1993c).
The definitions of accessible surface and molecular surface can be related back to the Voronoi construction. The molecular surface is similar to `time-averaging' the surface formed from the faces of Voronoi polyhedra (the Voronoi surface) over many water configurations, and the accessible surface is similar to averaging the Delaunay triangulation of the first layer of water molecules over many configurations.
There are a number of other definitions of protein surfaces that are unrelated to either the probe-sphere method or Voronoi polyhedra and provide complementary information (Kuhn et al., 1992; Leicester et al., 1988; Pattabiraman et al., 1995).
The definition of protein surfaces and volumes depends greatly on the values chosen for various parameters of linear dimension – in particular, van der Waals and probe-sphere radii.
For all the calculations outlined above, the hard-sphere approximation is used for the atoms. (One must remember that in reality atoms are neither hard nor spherical, but this approximation has a long history of demonstrated utility.) There are many lists of the radii of such spheres prepared by different laboratories, both for single atoms and for unified atoms, where the radii are adjusted to approximate the joint size of the heavy atom and its bonded hydrogen atoms (clearly not an actual spherical unit).
Some of these lists are reproduced in Table 22.1.1.1. They are derived from a variety of approaches, e.g. looking for the distances of closest approach between atoms (the Bondi set) and energy calculations (the CHARMM set). The differences between the sets often come down to how one decides to truncate the Lennard–Jones potential function. Further differences arise from the parameterization of water and other hydrogen-bonding molecules, as these substances really should be represented with two radii, one for their hydrogen-bonding interactions and one for their VDW interactions.
|
Perhaps because of the complexities in defining VDW parameters, there are some great differences in Table 22.1.1.1. For instance, the radius for an aliphatic CH (>CH=) ranges from 1.7 to 2.38 Å, and the radius for carboxyl oxygen ranges from 1.34 to 1.89 Å. Both of these represent at least a 40% variation. Moreover, such differences are practically quite significant, since many geometrical and energetic calculations are very sensitive to the choice of VDW parameters, particularly the relative values within a single list. (Repulsive core interactions, in fact, vary almost exponentially.) Consequently, proper volume and surface comparisons can only be based on numbers derived through use of the same list of radii.
In the last column of the table we give a recent set of VDW radii that has been carefully optimized for use in volume and packing calculations. It is derived from analysis of the most common distances between atoms in small-molecule crystal structures in the Cambridge Structural Database (Rowland & Taylor, 1996; Tsai et al., 1999).
A series of surfaces can be described by using a probe sphere with a specified radius. Since this is to be a convenient mathematical construct in calculation, any numerical value may be chosen with no necessary relation to physical reality. Some commonly used examples are listed in Table 22.1.1.2.
|
The solvent-accessible surface is intended to be a close approximation to what a water molecule as a probe might `see' (Lee & Richards, 1971). However, there is no uniform agreement on what the proper water radius should be. Usually it is chosen to be about 1.4 Å.
Volume calculations are principally applied in measuring packing. This is because the packing efficiency of a given atom is simply the ratio of the space it could minimally occupy to the space that it actually does occupy. As shown in Fig. 22.1.1.8, this ratio can be expressed as the VDW volume of an atom divided by its Voronoi volume (Richards, 1974, 1985; Richards & Lim, 1994). (Packing efficiency also sometimes goes by the equivalent terms `packing density' or `packing coefficient'.) This simple definition masks considerable complexities – in particular, how does one determine the volume of the VDW envelope (Petitjean, 1994)? This requires knowledge of what the VDW radii of atoms are, a subject on which there is not universal agreement (see above), especially for water molecules and polar atoms (Gerstein et al., 1995; Madan & Lee, 1994).
Knowing that the absolute packing efficiency of an atom is a certain value is most useful in a comparative sense, i.e. when comparing equivalent atoms in different parts of a protein structure. In taking a ratio of two packing efficiencies, the VDW envelope volume remains the same and cancels. One is left with just the ratio of space that an atom occupies in one environment to what it occupies in another. Thus, for the measurement of packing, standard reference volumes are particularly useful. Recently calculated values of these standard volumes are shown in Tables 22.1.1.3 and 22.1.1.4 for atoms and residues (Tsai et al., 1999).
|
|
In analysing molecular systems, one usually finds that close packing is the default (Chandler et al., 1983), i.e. atoms pack like billiard balls. Unless there are highly directional interactions (such as hydrogen bonds) that have to be satisfied, one usually achieves close packing to optimize the attractive tail of the VDW interaction. Close-packed spheres of the same size have a packing efficiency of ∼0.74. Close-packed spheres of different size are expected to have a somewhat higher packing efficiency. In contrast, water is not close-packed because it has to satisfy the additional constraints of hydrogen bonding. It has an open, tetrahedral structure with a packing efficiency of ∼0.35. (This difference in packing efficiency is illustrated in Fig. 22.1.1.8b)
The protein core is usually considered to be the atoms inaccessible to solvent i.e. with an accessible surface area of zero or a very small number, such as 0.1 Å2. Packing calculations on the protein core are usually done by calculating the average volumes of the buried atoms and residues in a database of crystal structures. These calculations were first done more than two decades ago (Chothia & Janin, 1975; Finney, 1975; Richards, 1974). The initial calculations revealed some important facts about protein structure. Atoms and residues of a given type inside proteins have a roughly constant (or invariant) volume. This is because the atoms inside proteins are packed together fairly tightly, with the protein interior better resembling a close-packed solid than a liquid or gas. In fact, the packing efficiency of atoms inside proteins is roughly as expected for the close packing of hard spheres (0.74).
More recent calculations measuring the packing in proteins (Harpaz et al., 1994; Tsai et al., 1999) have shown that the packing inside of proteins is somewhat tighter (by ∼4%) than that observed initially and that the overall packing efficiency of atoms in the protein core is greater than that in crystals of organic molecules. When molecules are packed this tightly, small changes in packing efficiency are quite significant. In this regime, the limitation on close packing is hard-core repulsion, which is expected to have a twelfth power or exponential dependence, so even a small change is energetically quite substantial. Furthermore, the number of allowable configurations that a collection of atoms can assume without core overlap drops off very quickly as these atoms approach the close-packed limit (Richards & Lim, 1994).
The exceptionally tight packing in the protein core seems to require a precise jigsaw puzzle-like fit of the residues. This appears to be the case for the majority of atoms inside of proteins (Connolly, 1986). The tight packing in proteins has, in fact, been proposed as a quality measure in protein crystal structures (Pontius et al., 1996). It is also believed to be a strong constraint on protein flexibility and motions (Gerstein et al., 1993; Gerstein, Lesk & Chothia, 1994). However, there are exceptions, and some studies have focused on these, showing how the packing inside proteins is punctuated by defects, or cavities (Hubbard & Argos, 1994, 1995; Kleywegt & Jones, 1994; Kocher et al., 1996; Rashin et al., 1986; Richards, 1979; Williams et al., 1994). If these defects are large enough, they can contain buried water molecules (Baker & Hubbard, 1984; Matthews et al., 1995; Sreenivasan & Axelsen, 1992).
Surprisingly, despite the intricacies of the observed jigsaw puzzle-like packing in the protein core, it has been shown that one can simply achieve the `first-order' aspect of this, getting the overall volume of the core right rather easily (Gerstein, Sonnhammer & Chothia, 1994; Kapp et al., 1995; Lim & Ptitsyn, 1970). This has to do with simple statistics for summing random numbers and the fact that the distribution of sizes for amino acids usually found inside proteins is rather narrow (Table 22.1.1.3). In fact, the similarly sized residues Val, Ile, Leu and Ala (with volumes 138, 163, 163 and 89 Å3 ) make up about half of the residues buried in the protein core. Furthermore, aliphatic residues, in particular, have a relatively large number of adjustable degrees of freedom per Å3, allowing them to accommodate a wide range of packing geometries. All of this suggests that many of the features of protein sequences may only require random-like qualities for them to fold (Finkelstein, 1994).
Measuring the packing efficiency inside the protein core provides a good reference point for comparison, and a number of other studies have looked at this in comparison with other parts of the protein. The most obvious thing to compare with the protein inside is the protein outside, or surface. This is particularly interesting from a packing perspective, since the protein surface is covered by water, and water is packed much less tightly than protein and in a distinctly different fashion. (The tetrahedral packing geometry of water molecules gives a packing efficiency of less than half that of hexagonal close-packed solids.)
Calculations based on crystal structures and simulations have shown that the protein surface has intermediate packing, being packed less tightly than the core but not as loosely as liquid water (Gerstein & Chothia, 1996; Gerstein et al., 1995). One can understand the looser packing at the surface than in the core in terms of a simple trade-off between hydrogen bonding and close packing, and this can be explicitly visualized in simulations of the packing in simple toy systems (Gerstein & Lynden-Bell, 1993a,b).
Interactions between molecules are most likely to be mediated by the properties of residues at their surfaces. Surfaces have figured prominently in functional interpretations of macromolecular structure. Which residues are most likely to interact with other molecules? What are their properties: charged, polar, or hydrophobic? What would be the estimated energy of interaction? How do the shapes and properties complement one another? Which surfaces are most conserved among a homologous family? At the centre of these questions that are often asked at the start of a structural interpretation lies the calculation of the molecular and/or accessible surfaces.
Surface-area calculations are used in two ways. Graphical surface representations help to obtain a quick intuitive understanding of potential molecular functions and interactions through visualization of the shape, charge distribution, polarity, or sequence conservation on the molecular surface (for example). Quantitative calculations of surface area are used en route to approximations of the free energy of interactions in binding complexes.
Part of this subject area was the topic of an excellent review by Richards (1985), to which the reader is referred for greater coverage of many of the methods of calculation. This review will attempt to incorporate more recent developments, particularly in the use of graphics, both realistic and schematic.
The concept of molecular surface derives from the behaviour of non-bonded atoms as they approach each other. As indicated by the Lennard–Jones potential, strong unfavourable interactions of overlapping non-bonding electron orbitals increase sharply according to , and atoms behave almost as if they were hard spheres with van der Waals radii that are characteristic for each atom type and nearly independent of chemical context. Of course, when orbitals combine in a covalent bond, atoms approach much more closely. Lower-energy attractions between atoms, such as hydrogen bonds or aromatic ring stacking, lead to modest reductions in the distance of closest approach. The van der Waals surface is the area of a volume formed by placing van der Waals spheres at the centre of each atom in a molecule.
Non-bonded atoms of the same molecule contact each other over (at most) a very small proportion of their van der Waals surface. The surface is complicated with gaps and crevices. Much of this surface is inaccessible to other atoms or molecules, because there is insufficient space to place an atom without resulting in forbidden overlap of non-bonded van der Waals spheres (Fig. 22.1.2.1). These crevices are excluded in the molecular surface area. The molecular surface area, also known as the solvent-excluding surface, is the outer surface of the volume from which solvent molecules are excluded. Strictly, this would depend on the orientation of non-spherically symmetric solvents such as water. However, since hydrogen atoms are smaller than oxygen atoms, for current purposes it is sufficient to consider water as a sphere with a radius of 1.4 to 1.7 Å, approximating the `average' distance from the centre of the oxygen atom to the van der Waals surface of water. The practical definition of the molecular surface is, then, the area of the volume excluded to a spherical probe of 1.4 to 1.7 Å radius.
As an aside, it is important to note that surface-area calculations depend on inexact parameterization. For example, there is no radius of any hard-sphere model that can give a realistic representation of the solvent. Furthermore, the choice of van der Waals radii can depend on whether the distance of zero or minimum potential energy is estimated and the potential-energy function or experimental data used. (Tables of common values are given by Gerstein & Richards in Section 22.1.1.) Thus, calculations of molecular and accessible surfaces are approximate. However, when the errors are averaged over large areas of a macromolecule, the numbers can be precise enough to give important insights into function.
Fig. 22.1.2.1 shows that the molecular surface consists of two components. The contact surface is part of the van der Waals surface. The re-entrant surface encloses the interstitial volume and has components that are the exterior surfaces of atoms (contact surface) and parts of the surfaces of probes placed in positions where they are in contact with van der Waals surfaces of two or more atoms (re-entrant surface).
The occluded molecular surface is an approximate complement to the solvent-accessible surface. It is the part of the surface that would be inaccessible to solvent because of steric conflict with neighbouring macromolecular atoms. It is an approximation in that current calculations use van der Waals surfaces, ignoring the differences between atomic and re-entrant surfaces (see below), and the volume of the probe is not fully accounted for (Pattabiraman et al., 1995). Occluded area is defined as the atomic area whose normals cannot be extended 2.8 Å (the presumptive diameter of a water molecule) without intersecting the van der Waals volume of another atom. This crude approximation to the surface that is inaccessible to water not only increases the speed of calculation, but enables surface areas to be partitioned between the atoms. It is used primarily to evaluate model protein structures by comparing the fraction of each amino acid's surface area that is occluded with that calculated for the same residue types in a database of accurate structures.
Whether graphically displaying a molecule or examining potential docking interactions, it is usually the molecular surface or solvent-accessible surface that is used. However, macromolecules also interact through the small (solvent) molecules that are more or less tightly bound (Gerstein & Lynden-Bell, 1993c). There is a gradation of how tightly solvent molecules are bound and how many are bound around different side chains. With dynamics simulations, Gerstein & Lynden-Bell (1993c) showed that the second hydration shell was a reasonable, practical `average' limit to which water atoms should be considered significantly perturbed by the protein. They defined a hydration surface as the surface of this second shell and presented evidence that it approximates the boundary between bound and bulk solvent. They presented calculations that showed that molecules interact significantly when their hydration surfaces interact, and not just when they are close enough for their molecular surfaces to form contacts. It may be computationally impractical to perform the simulations required to calculate the hydration surfaces of many proteins, but this work reminds us that energetically significant interactions occur over a wider area than the commonly computed contact molecular-surface area.
The hydrophobic effect (Kauzmann, 1959; Tanford, 1997) has its origins in unfavourable entropic terms for water molecules immediately surrounding a hydrophobic group. In the bulk solvent, each water molecule can be oriented in a variety of ways with favourable hydrogen bonding. At the interface with a hydrophobic group, hydrogen bonds are possible only in some directions, with some configurations of the water molecules. When a hydrophobic group is embedded in water, the surrounding solvent molecules have a more restricted set of hydrogen-bonding configurations, resulting in an unfavourable entropic term. The magnitude of the entropic term should be proportional to the number of solvent molecules immediately surrounding the hydrophobic group. This integer number can be considered very approximately proportional to the area of the surface made by the centres of the set of possible solvent probes contacting the solute, i.e. the solvent-accessible surface area (Fig. 22.1.2.1). When large areas are considered, summed over many hydrophobic atoms, the errors of this non-integer approximation are insignificant. It is now common practice to estimate the hydrophobic effect free-energy contribution by multiplying the change in macromolecular surface area by an energy per unit area [~80 J mol−1 Å−2 (Richards, 1985), but see also below].
The first method to be discussed allows the calculation of an accessible surface. The first method for calculating molecular surface involved raining water down on a model of a macromolecule and constructing a surface by making a net under the spheres in their landing positions (Greer & Bush, 1978). This ignored overhangs and was replaced by the dot surface method. More recently, methods were developed to make polyhedral surfaces of triangles by contouring between lattice points or by delimiting with arcs the spherical and toroidal surfaces and then subdividing the piece-wise quartic molecular surface. The surface is then composed of patches whose areas can be precisely integrated. van der Waals surfaces consist of convex spherical triangles whose areas can be estimated by the Gauss–Bonnet theorem. Re-entrant surfaces are comprised of concave spherical triangles whose areas can be similarly estimated and toroidal saddle-shaped patches whose areas can be calculated by analytical geometry and calculus.
The first method for calculating the accessible surface area overlaid the molecule on a regular stack of finely spaced parallel planes (Lee & Richards, 1971). The advantage of this method was the ease with which the area could be calculated. The intersection of the atomic surfaces with the planes were circular arcs whose lengths were readily calculated and multiplied by the planar spacing to give an approximation to the surface area. Programs that are currently distributed use more sophisticated methods.
A molecular dot surface is a smooth envelope of points on the molecular surface. A probe sphere is placed at a set of approximately evenly spaced points so that the probe and van der Waals surfaces of a given atom are tangential. If the probe sphere does not overlap any other atom, the point is designated as surface. To define the re-entrant surface, sphere centres are also sampled that are tangential to both van der Waals spheres of a pair of neighbouring atoms and are equidistant from the interatomic axis. Arcs are then drawn between surface points and the arcs are subdivided into a set of finely spaced points to define the re-entrant surface. Similarly, spheres contacting triplets of neighbouring atoms are tested, and approximately evenly spaced points within the concave triangle defined by the three contact points are added to the re-entrant surface.
This is conceptually the simplest method and is used in the program GRASP (Nicholls et al., 1991). First, grid points of a cubic lattice overlaid on the molecule are segregated into `interior' and `exterior' as follows. All points farther from an atom than the sum of the van der Waals radius and a probe radius are flagged as external. External points with an internal neighbour are flagged as an approximate `accessible surface'. All grid points falling within probe spheres centred at each surface point now join the set of exterior points. Points that remain `interior' define the volume enclosed by the molecular surface.
All that remains is to contour the molecular surface that lies between interior and exterior grid points. It is a little complicated in three dimensions and is achieved by the marching-cube algorithm. Cubes containing adjacent grid points that are both interior and exterior are used to define potential polyhedral vertices. Triangles are defined by joining the midpoints of unit-cell edges that have one interior and one exterior point. The triangles are joined at their edges in a consistent manner to create a polyhedral surface.
Several algorithms start by dividing the surface into regions within which the surface is smooth and continuous. The surface can be efficiently described in terms of a set of arcs and their start and end points. In complete rolling, the probe is placed in all possible positions at which it contacts the van der Waals spheres of three neighbouring atoms. Those surrounding the same atom are paired as the start and end points of an arc. The complete rolling algorithm does not distinguish outer and inner (cavity) surfaces. In the connected rolling algorithm, the process starts at a triple contact point that is far from the centre of mass and therefore likely to be external. The probe is then rolled only along crevices between two atoms, pursuing all alternatives, stopping each pathway only when the probe returns to a place that has already been probed. This algorithm therefore produces only the outer surface.
An analytical method was also proposed for calculating approximate accessible areas (Wodak & Janin, 1980). It assumed random distributions of neighbouring atoms, but this can be a sufficient approximation when calculating the area of an entire molecule. The areas of spherical and toroidal pieces of surface can be calculated exactly by analytic and differential geometry (Richmond, 1984; Connolly, 1983). An advantage of analytical expressions over the prior numerical approximations is that analytical derivatives of the areas can be calculated, albeit with significant difficulty. This then provides the opportunity to optimize atomic positions with respect to surface area. Pseudo-energy functions that approximate the hydrophobic contribution to free energy with a term proportional to the accessible surface area (Richards, 1977) can therefore be incorporated in energy-minimization programs. Although rigorous, these methods are computationally cumbersome and are not used in all energy-minimization routines. Incorporation of solvent effects may become more universal with the Gaussian atom approximations discussed below.
The methods discussed above are computationally quite cumbersome, especially if they need to be repeated many times. Thus, they are not well suited to comparisons of many structures. They are also not well suited to the calculation of surface-area-dependent energy terms during dynamics simulation or energy minimization, which require the calculation of the derivatives of the surface area with respect to atomic position. It has been argued by several (including A. Nicholls and K. Sharp, personal communications) that simplifying approximations to the surface-area calculations are in order, because the common uses of surface area already embody crude ad hoc approximations, such as non-integer numbers of spherical solvent molecules.
In the treatments discussed earlier, the volume of the protein is (implicitly) described by a set of overlapping step functions that have a constant value if close enough to an atom, or zero if not. Several authors have replaced these step functions with continuous spherical Gaussian functions centred on each atom (Gerstein, 1992; Grant & Pickup, 1995) in treatments reminiscent of Ten Eyck's electron-density calculations (Ten Eyck, 1977). This speeds up the calculation and also facilitates the calculation of analytical derivatives of the surface area. A surface can be calculated for graphical display by contouring the continuous function at an appropriate threshold. The final envelope can be modified by using iterative procedures that fill cavities and crevices that are (nearly) surrounded by protein atoms (Gerstein, 1992).
Structures of macromolecules determined by X-ray crystallography rarely reveal the positions of the hydrogen atoms. It is, of course, possible to add explicit hydrogen atoms at the stereochemically most likely positions, but this is rarely done for surface-area calculations. Instead, their average effect is approximately and implicitly accounted for by increasing the heteroatom van der Waals radius by 0.1 to 0.3 Å. (It is not usual to smear atoms to account for thermal motion.)
As previously introduced, hydrophobic energies result primarily from the increased entropy of water molecules at the macromolecule–solvent interface and can be estimated from the accessible surface area. A number of different constants relating area to free energy of transfer from a hydrophobic to aqueous environment have been proposed in the range of 67 to 130 J mol−1 Å−2 (Reynolds et al., 1974; Chothia, 1976; Hermann, 1977; Eisenberg & McLachlan, 1986), but if a single value is to be used for all of the protein surface, the consensus among crystallographers has been about 80 J mol−1 Å−2 (Richards, 1985).
There are two widely used enhancements of the basic method. Atomic solvation parameters (ASPs, Δσ) remove the assumption that all protein atoms have equal potential influence on the hydrophobic free energy. Eisenberg & McLachlan (1986) determined separate ASPs for atom types C, N/O, O.., N+ and S (treating hydrogen atoms implicitly) by fitting these constants to the experimentally determined octanol/water relative transfer free energies of the 20 amino-acid side chains of Fauchere & Pliska (1983), assuming standard conformations of the side chains. A much improved free energy change of solvation can then be estimated from , where the summation is over all atoms with accessible area A and is specific for the atom type. Their estimates of ASPs are given in Table 22.1.2.1. Use of ASPs rather than a single value for all atoms makes substantial differences to the estimated free energies of association of macromolecular assemblies (Xie & Chapman, 1996). Through calculation of the overall energy of solvation, calculations with ASPs also allow discrimination between proposed structures that are correctly folded (with hydrophobic side chains that are predominantly internal) and those that are not (Eisenberg & McLachlan, 1986).
|
The work of Sharp et al. (1991) indicates that hydrophobicity depends not only on surface area, but curvature. Sharp et al. were trying to reconcile long-apparent differences between microscopic and macroscopic measurements of hydrophobicity (Tanford, 1979). Microscopic measurements, the basis of all of our preceding discussions, are derived from the partitioning of dilute solutes between solvents. Macroscopic values can come from the measurements of the surface tension between a liquid bulk of the molecule of interest and water. Macroscopic values for aliphatic carbons are much higher, ~302 J mol−1 Å−2. Postulating that the entropic effects at the heart of hydrophobicity depended on the number of water molecules in contact with each other at the molecular surface (Nicholls et al., 1991), Sharp et al. pointed out that not all surfaces were equivalent. Relative to a plane, concave solute surfaces would accommodate fewer solvent molecules neighbouring the molecular surface, whereas convex surfaces would accommodate more. Their treatment could be considered to be a second-order approximation to the number of interfacial solvent molecules, compared to the prior first-order consideration of only area.
To calculate the curvature of point a on the accessible surface (relative to that of a plane), a sphere of twice the solvent radius is drawn (Nicholls et al., 1991). This represents the locus of the centres of solvent molecules that could be in contact with a solvent at a. A curvature correction, c, is the proportion of points on the spherical surface that are inside the inaccessible volume, relative to that for a planar accessible surface (). In calculating the free energy of transfer, each element of the accessible area is multiplied by its curvature correction. When this is done, the increasingly convex surfaces of small aliphatic molecules account for most of the discrepancy between microscopic and macroscopic hydrophobicities (Nicholls et al., 1991). Furthermore, it emphasizes that, just by their shape, concave surfaces can become relatively hydrophobic. This has been clearly illustrated with GRASP surface representations (see below) in which the accessible surface is coloured according to the local curvature (Nicholls et al., 1991). Consideration of curvature also indicates that the energy of macromolecular association is slightly less than it would otherwise be due to the generation of a concave collar at the interface between two binding macromolecules (Nicholls et al., 1991).
In a molecular association in which (as is often the case) hydrophobic interactions dominate, the binding energy can be estimated from the surfaces of the individual molecules that become buried upon association (Richards, 1985). The buried area is simply the sum of the surfaces of the two molecules (calculated independently) minus the surface of the complex, calculated as if one molecule. Usually, all heteroatoms are regarded as equivalent, and the buried area is multiplied by a uniform constant, say 80 J mol−1 Å−2 (Richards, 1985). It is only slightly more complicated to use the different ASPs (Eisenberg & McLachlan, 1986) for different atom types and/or to account for curvature (Nicholls et al., 1991). It should be noted that in many crystal structures, the distinction between atom types in some side chains remains indeterminate, e.g. N and C in histidines, O and O.. in carboxylates, and N and N+ in arginines. In such cases, average values of the two ASPs can be used (Xie & Chapman, 1996). Such energy calculations have been put to several uses, including attempts to predict assembly and disassembly pathways for viral capsid assemblies (Arnold & Rossmann, 1990; Xie & Chapman, 1996, and citations therein).
Which are the amino acids most likely to interact with other molecules? It is reasonable to expect them to be surface-accessible. In determining which residues are most surface-exposed, it is necessary to partition molecular or accessible surfaces between atoms. Contact surfaces (Fig. 22.1.2.1) are atom specific. Re-entrant or accessible surfaces can be divided among surface atoms by proximity. Surface areas can then be summed over the atoms in a residue. Accessible surface areas are sometimes reported as accessibilities (Lee & Richards, 1971) – fractions of a maximum where the standard is evaluated from a tripeptide in which the residue of interest is surrounded by glycines. A different approach to assessing surface exposure is to ask what is the largest molecular fragment that could contact a given atom. This is commonly assayed by determining the largest sphere that can be placed tangentially to the van der Waals surface without intersecting any other atom. An alternative approach to locating functionally important surface regions was proposed in the mid-1980s, but is currently not used very often. The local irregularity of surface texture was characterized through measurement of the fractal dimension (Lewis & Rees, 1985).
Substrates, drugs and ligands often bind in clefts or pockets that are concave in shape. Conversely, it is the most exposed convex regions that are likely to be antigenic. The surface shape can be determined by placing a large (say 6 Å radius) sphere at each vertex of the polyhedral molecular surface. If more than half of the sphere's volume overlaps the molecular volume, then the surface is concave, while if less than half, the surface is convex.
Are there similarities in the shapes of surfaces at the interfaces of macromolecular complexes? For example, are there similarities between the shapes of evolutionary-related antigens or the hypervariable regions of antibodies that bind to them? Quantitative comparison of surface topologies is far from trivial, with questions of 3D alignment, the metrics to be used in quantifying topology etc. In addition to real differences between molecules, their surfaces may appear to differ due to the resolutions at which their structures were determined. Gerstein (1992) has proposed that comparisons be made in reciprocal space so that correlations can be judged as a function of resolution. Coordinates are aligned. Spherical Gaussian functions are placed at each atom, and an envelope is calculated at some threshold value and modified to remove cavities. Gerstein found that comparison of the envelope structure-factor vectors, obtained by Fourier transformation, led to a plausible classification of the hypervariable regions of known antibody structures.
With very large complexes, such as viruses, the surface features to be viewed are obvious at low resolution. In a very simple yet effective representation popularized by the laboratories of David Stuart and Jim Hogle, a Cα trace is `depth cued' (shaded) according to the distance from the centre of mass (Acharya et al., 1990; Fig. 1 for example). The impression of three dimensions probably results from the similarity of the shading to highlighting. The method is most effective for large complexes in which there are sufficient Cα atoms to give a dense impression of a surface.
In one of the earliest surface graphical representations, dots were drawn for each Connolly surface dot, using vector-graphics terminals. With the improved graphics capability of modern computers, dot representations have been replaced by ones in which solid polyhedra are drawn with a large enough number of small triangular faces such that the surface appears smooth. These representations are clearer, because atoms in the foreground obscure those in the background.
Depth and three-dimensional relationships are most easily represented by stereovision or rotation of objects in real time on a computer screen. Graphics engines for interactive computers compromise quality for the speed necessary for interactive response, but simple depth cueing (combined with motion or stereo) is sufficient for good 3D representation. For still and/or non-stereo images more common in publications, more sophisticated rendering is helpful and possible now that speed is not a constraint. In Raster3D (Merritt & Bacon, 1997), multiple-light-source shading and highlighting is added, with individual calculations for each fine pixel. These are dependent on the directions of the normals to the surface, which are calculated analytically for spherical surfaces. More complicated surfaces, input as connected triangles, have surfaces rendered raster, pixel by pixel, by interpolating between the surface-normal vectors at the vertices of the surrounding triangle. Together, this leads to a high-quality smooth image that conveys much of the three-dimensionality of molecular surfaces.
GRASP is currently perhaps the most popular program for the display of molecular surfaces. Readers are referred to the program documentation (Nicholls, 1992) or a paper that tangentially describes an early implementation (Nicholls et al., 1991). The molecular or accessible surface is determined by the marching-cube algorithm. The surface is filled using methods that make modest compromises on photorealistic light reflection etc., but take advantage of machine-dependent Silicon Graphics surface rendering to perform the display fast enough for interactive adjustment of the view.
The most powerful part of the program is the ability to colour according to properties mapped to the surface (see Fig. 22.1.2.2). These may be values of (say) electrostatic potential interpolated from a three-dimensional lattice. Much has been learned about many proteins from the potentials determined by solution of the Poisson–Boltzmann equation (Nicholls & Honig, 1991). The electrostatic complementarity of binding surfaces has often been readily apparent in ways that were not obvious from Coulombic calculations that ignore screening or from calculations and graphics representations that treat the charges of individual atoms as independent entities.
Many other properties can be mapped to the surface. These include properties of the atoms associated with that part of the surface (such as thermal factors), curvature of the surface calculated from adjacent atoms (Nicholls & Honig, 1991), or distance to the nearest part of the surface of an adjacent molecule. GRASP is now used to illustrate complicated molecular structures, in part because it also supports the superimposition of other objects over the molecular surface. These include the representation of molecules with CPK spheres and/or bonds, and the representation of electrostatic potentials with field lines, dipole vectors etc.
For their work on viruses, Rossmann & Palmenberg (1988) introduced a highly schematic representation in which individual amino acids were labelled. The methods were extended by Chapman (1993) to other proteins and to the automatic display of features such as topology, sequence similarity and hydrophobicity. Roadmaps sacrifice a realistic impression of shape for the ability to show the locations and properties of constituent surface atoms or residues. This has been important in combining the power of structure and molecular biology in understanding function. Potential sites of mutation are readily identified without substantial molecular-graphics resources, and phenotypes of mutants are readily mapped to the surface and compared with the physiochemical properties to reveal structure–function correlations.
For a set of projection vectors, the intersection points with the first van der Waals (or solvent-accessible) surface of an atom are calculated by basic vector algebra. The atom is identified so that when the projection is mapped to a plane for display, the boundaries of each atom or amino acid can be determined. The atoms or amino acids can then be coloured, shaded, outlined, contoured, or labelled according to parameters that are either calculated from the coordinates (such as distance from the centre of mass), read from a file (such as sequence similarity), or follow properties that are dependent on the residue type (e.g. hydrophobicity) or atom type [e.g. atomic solvation parameters (Eisenberg & McLachlan, 1986)].
Several types of projections can be used. The simplest is similar to that used by most other surface-imaging programs. A set of parallel projection vectors is mapped to a 2D grid. An example is shown in Fig. 22.1.2.3. This view avoids distortions, but only one side of the molecule is visualized. Roadmaps are flat, two-dimensional projections that cannot be rotated in real time to reveal other views. Three-dimensionality is limited to an extension by Jean-Yves Sgro that maps the parallel projection of one view to a three-dimensional surface shell that can be rotated with interactive graphics and/or viewed with stereo imaging (Harber et al., 1995; Sgro, 1996). However, the schematic nature of roadmaps leads to the ability to view all parts of the molecule simultaneously.
To view all parts of the molecule, cylindrical projections are used that are similar to those used in atlases. This is possible because the representation is schematic (not realistic), and longitudinal distortion, similar to that near the poles in world maps, is acceptable. The surface is projected outwards radially onto a cylinder that wraps around the macromolecule (Fig. 22.1.2.4). Active-site clefts, drug or inhibitor binding sites and pores can be similarly illustrated by projecting their surfaces outward (from the axis) onto a cylinder that encloses the pore, pocket, or cleft. Such clefts are rarely straight, but with some distortion a satisfactory representation is possible by segmenting the cylinder, so that its axis follows the (curved) centre of the binding site or pore (Fig. 22.1.2.4).
Both quantitative and qualitative analyses of the surfaces of biomolecules are among the most powerful methods of elucidating functional mechanism from three-dimensional structures. A wide array of methods have been developed to help understand binding interactions and macromolecular assembly and to visualize the shape and physiochemical surface properties of macromolecules. Visualization methods range from those that depict a realistic impression of the topology to those that are more schematic and facilitate collation of structural and genetic information.
Acknowledgements
The authors thank Genfa Zhou for providing Fig. 22.1.2.2. MSC gratefully acknowledges the support of the National Science Foundation (BIR 94-18741 and DBI 98-08098), the National Institutes of Health (GM 55837) and the Markey Charitable Trust. MG acknowledges support from the NSF Database Activities Program DBI-9723182.
References
Acharya, R., Fry, E., Logan, D., Stuart, D., Brown, F., Fox, G. & Rowlands, D. (1990). The three-dimensional structure of foot-and-mouth disease virus. New aspects of positive-strand RNA viruses, edited by M. A. Brinton & S. X. Heinz, pp. 319–327. Washington DC: American Society for Microbiology.Google ScholarArnold, E. & Rossmann, M. G. (1990). Analysis of the structure of a common cold virus, human rhinovirus 14, refined at a resolution of 3.0 Å. J. Mol. Biol. 211, 763–801.Google Scholar
Baker, E. N. & Hubbard, R. E. (1984). Hydrogen bonding in globular proteins. Prog. Biophys. Mol. Biol. 44, 97–179.Google Scholar
Bernal, J. D. & Finney, J. L. (1967). Random close-packed hard-sphere model II. Geometry of random packing of hard spheres. Discuss. Faraday Soc. 43, 62–69.Google Scholar
Blake, C. C. F., Koenig, D. F., Mair, G. A., North, A. C. T., Phillips, D. C. & Sarma, V. R. (1965). Structure of hen egg-white lysozyme, a three-dimensional Fourier synthesis at 2 Å resolution. Nature (London), 206, 757–761.Google Scholar
Bondi, A. (1964). van der Waals volumes and radii. J. Phys. Chem. 68, 441–451.Google Scholar
Bondi, A. (1968). Molecular crystals, liquids and glasses. New York: Wiley.Google Scholar
Brooks, B. R., Bruccoleri, R. E., Olafson, B. D., States, D. J., Swaminathan, S. & Karplus, M. (1983). CHARMM: a program for macromolecular energy, minimization, and dynamics calculations. J. Comput. Chem. 4, 187–217.Google Scholar
Chandler, D., Weeks, J. D. & Andersen, H. C. (1983). van der Waals picture of liquids, solids, and phase transformations. Science, 220, 787–794.Google Scholar
Chapman, M. S. (1993). Mapping the surface properties of macromolecules. Protein Sci. 2, 459–469.Google Scholar
Chapman, M. S. (1994). Sequence similarity scores and the inference of structure/function relationships. Comput. Appl. Biosci. (CABIOS), 10, 111–119.Google Scholar
Chothia, C. (1975). Structural invariants in protein folding. Nature (London), 254, 304–308.Google Scholar
Chothia, C. (1976). The nature of the accessible and buried surfaces in proteins. J. Mol. Biol. 105, 1–12.Google Scholar
Chothia, C. & Janin, J. (1975). Principles of protein–protein recognition. Nature (London), 256, 705–708.Google Scholar
Connolly, M. (1986). Measurement of protein surface shape by solid angles. J. Mol. Graphics, 4, 3–6.Google Scholar
Connolly, M. L. (1983). Analytical molecular surface calculation. J. Appl. Cryst. 16, 548–558.Google Scholar
Connolly, M. L. (1991). Molecular interstitial skeleton. Comput. Chem. 15, 37–45.Google Scholar
Diamond, R. (1974). Real-space refinement of the structure of hen egg-white lysozyme. J. Mol. Biol. 82, 371–391.Google Scholar
Dunfield, L. G., Burgess, A. W. & Scheraga, H. A. (1979). J. Phys. Chem. 82, 2609.Google Scholar
Edelsbrunner, H., Facello, M. & Liang, J. (1996). On the definition and construction of pockets in macromolecules, pp. 272–287. Singapore: World Scientific.Google Scholar
Edelsbrunner, H., Facello, M., Ping, F. & Jie, L. (1995). Measuring proteins and voids in proteins. Proc. 28th Hawaii Intl Conf. Sys. Sci. pp. 256–264.Google Scholar
Edelsbrunner, H. & Mucke, E. (1994). Three-dimensional alpha shapes. ACM Trans. Graphics, 13, 43–72.Google Scholar
Eisenberg, D. & McLachlan, A. D. (1986). Solvation energy in protein folding and binding. Nature (London), 319, 199–203.Google Scholar
Fauchere, J.-L. & Pliska, V. (1983). Hydrophobic parameters π of amino-acid side chains from the partitioning of N-acetyl-amino-acid amides. Eur. J. Med. Chem. Chim. Ther. 18, 369–375.Google Scholar
Finkelstein, A. (1994). Implications of the random characteristics of protein sequences for their three-dimensional structure. Curr. Opin. Struct. Biol. 4, 422–428.Google Scholar
Finney, J. L. (1975). Volume occupation, environment and accessibility in proteins. The problem of the protein surface. J. Mol. Biol. 96, 721–732.Google Scholar
Finney, J. L., Gellatly, B. J., Golton, I. C. & Goodfellow, J. (1980). Solvent effects and polar interactions in the structural stability and dynamics of globular proteins. Biophys. J. 32, 17–33.Google Scholar
Fritz-Wolf, K., Schnyder, T., Wallimann, T. & Kabsch, W. (1996). Structure of mitochondrial creatine kinase. Nature (London), 381, 341–345.Google Scholar
Gelin, B. R. & Karplus, M. (1979). Side-chain torsional potentials: effect of dipeptide, protein, and solvent environment. Biochemistry, 18, 1256–1268.Google Scholar
Gellatly, B. J. & Finney, J. L. (1982). Calculation of protein volumes: an alternative to the Voronoi procedure. J. Mol. Biol. 161, 305–322.Google Scholar
Gerstein, M. (1992). A resolution-sensitive procedure for comparing surfaces and its application to the comparison of antigen-combining sites. Acta Cryst. A48, 271–276.Google Scholar
Gerstein, M. & Chothia, C. (1996). Packing at the protein–water interface. Proc. Natl Acad. Sci. USA, 93, 10167–10172.Google Scholar
Gerstein, M., Lesk, A. M., Baker, E. N., Anderson, B., Norris, G. & Chothia, C. (1993). Domain closure in lactoferrin: two hinges produce a see-saw motion between alternative close-packed interfaces. J. Mol. Biol. 234, 357–372.Google Scholar
Gerstein, M., Lesk, A. M. & Chothia, C. (1994). Structural mechanisms for domain movements. Biochemistry, 33, 6739–6749.Google Scholar
Gerstein, M. & Lynden-Bell, R. M. (1993a). Simulation of water around a model protein helix. 1. Two-dimensional projections of solvent structure. J. Phys. Chem. 97, 2982–2991.Google Scholar
Gerstein, M. & Lynden-Bell, R. M. (1993b). Simulation of water around a model protein helix. 2. The relative contributions of packing, hydrophobicity, and hydrogen bonding. J. Phys. Chem. 97, 2991–2999.Google Scholar
Gerstein, M. & Lynden-Bell, R. M. (1993c). What is the natural boundary for a protein in solution? J. Mol. Biol. 230, 641–650.Google Scholar
Gerstein, M., Sonnhammer, E. & Chothia, C. (1994). Volume changes on protein evolution. J. Mol. Biol. 236, 1067–1078.Google Scholar
Gerstein, M., Tsai, J. & Levitt, M. (1995). The volume of atoms on the protein surface: calculated from simulation, using Voronoi polyhedra. J. Mol. Biol. 249, 955–966.Google Scholar
Grant, J. A. & Pickup, B. T. (1995). A Gaussian description of molecular shape. J. Phys. Chem. 99, 3503–3510.Google Scholar
Greer, J. & Bush, B. L. (1978). Macromolecular shape and surface maps by solvent exclusion. Proc. Natl Acad. Sci. USA, 75, 303–307.Google Scholar
Harber, J., Bernhardt, G., Lu, H.-H., Sgro, J.-Y. & Wimmer, E. (1995). Canyon rim residues, including antigenic determinants, modulate serotype-specific binding of polioviruses to mutants of the poliovirus receptor. Virology, 214, 559–570.Google Scholar
Harpaz, Y., Gerstein, M. & Chothia, C. (1994). Volume changes on protein folding. Structure, 2, 641–649.Google Scholar
Hermann, R. B. (1977). Use of solvent cavity area and number of packed solvent molecules around a solute in regard to hydrocarbon solubilities and hydrophobic interactions. Proc. Natl Acad. Sci. USA, 74, 4144–4195.Google Scholar
Hubbard, S. J. & Argos, P. (1994). Cavities and packing at protein interfaces. Protein Sci. 3, 2194–2206.Google Scholar
Hubbard, S. J. & Argos, P. (1995). Evidence on close packing and cavities in proteins. Curr. Opin. Biotechnol. 6, 375–381.Google Scholar
Kapp, O. H., Moens, L., Vanfleteren, J., Trotman, C. N. A., Suzuki, T. & Vinogradov, S. N. (1995). Alignment of 700 globin sequences: extent of amino acid substitution and its correlation with variation in volume. Protein Sci. 4, 2179–2190.Google Scholar
Kauzmann, W. (1959). Some factors in the interpretation of protein denaturation. Adv. Protein Chem. 14, 1–63.Google Scholar
Kelly, J. A., Sielecki, A. R., Sykes, B. D., James, M. N. & Phillips, D. C. (1979). X-ray crystallography of the binding of the bacterial cell wall trisaccharaide NAM-NAG-NAM to lysozymes. Nature (London), 282, 875–878.Google Scholar
Kim, K. H., Willingmann, P., Gong, Z. X., Kremer, M. J., Chapman, M. S., Minor, I., Oliveira, M. A., Rossmann, M. G., Andries, K., Diana, G. D., Dutko, F. J., McKinlay, M. A. & Pevear, D. C. (1993). A comparison of the anti-rhinoviral drug binding pocket in HRV14 and HRV1A. J. Mol. Biol. 230, 206–227.Google Scholar
Kleywegt, G. J. & Jones, T. A. (1994). Detection, delineation, measurement and display of cavities in macromolecular structures. Acta Cryst. D50, 178–185.Google Scholar
Kocher, J. P., Prevost, M., Wodak, S. J. & Lee, B. (1996). Properties of the protein matrix revealed by the free energy of cavity formation. Structure, 4, 1517–1529.Google Scholar
Kraulis, P. J. (1991). MOLSCRIPT: a program to produce both detailed and schematic plots of protein structures. J. Appl. Cryst. 24, 946–950.Google Scholar
Kuhn, L. A., Siani, M. A., Pique, M. E., Fisher, C. L., Getzoff, E. D. & Tainer, J. A. (1992). The interdependence of protein surface topography and bound water molecules revealed by surface accessibility and fractal density measures. J. Mol. Biol. 228, 13–22.Google Scholar
Lee, B. & Richards, F. M. (1971). The interpretation of protein structures: estimation of static accessibility. J. Mol. Biol. 55, 379–400.Google Scholar
Leicester, S. E., Finney, J. L. & Bywater, R. P. (1988). Description of molecular surface shape using Fourier descriptors. J. Mol. Graphics, 6, 104–108.Google Scholar
Levitt, M., Hirshberg, M., Sharon, R. & Daggett, V. (1995). Potential energy function and parameters for simulations of the molecular dynamics of proteins and nucleic acids in solution. Comput. Phys. Comm. 91, 215–231.Google Scholar
Lewis, M. & Rees, D. C. (1985). Fractal surfaces of proteins. Science, 230, 1163–1165.Google Scholar
Lim, V. I. & Ptitsyn, O. B. (1970). On the constancy of the hydrophobic nucleus volume in molecules of myoglobins and hemoglobins. Mol. Biol. (USSR), 4, 372–382.Google Scholar
Madan, B. & Lee, B. (1994). Role of hydrogen bonds in hydrophobicity: the free energy of cavity formation in water models with and without the hydrogen bonds. Biophys. Chem. 51, 279–289.Google Scholar
Matthews, B. W., Morton, A. G. & Dahlquist, F. W. (1995). Use of NMR to detect water within nonpolar protein cavities. (Letter.) Science, 270, 1847–1849.Google Scholar
Merritt, E. A. & Bacon, D. J. (1997). Raster3D: photorealistic molecular graphics. Methods Enzymol. 277, 505–525.Google Scholar
Molecular Structure Corporation (1995). Insight II user guide. Biosym/MSI, San Diego.Google Scholar
Nemethy, G., Pottle, M. S. & Scheraga, H. A. (1983). Energy parameters in polypeptides. 9. Updating of geometrical parameters, nonbonded interactions and hydrogen bond interactions for the naturally occurring amino acids. J. Phys. Chem. 87, 1883–1887.Google Scholar
Nicholls, A. (1992). GRASP: graphical representation and analysis of surface properties. New York: Columbia University.Google Scholar
Nicholls, A. & Honig, B. (1991). A rapid finite difference algorithm, utilizing successive over-relaxation to solve the Poisson–Boltzmann equation. J. Comput. Chem. 12, 435–445.Google Scholar
Nicholls, A., Sharp, K. & Honig, B. (1991). Protein folding and association: insights from the interfacial and thermodynamic properties of hydrocarbons. Proteins, 11, 281–296.Google Scholar
Olson, N., Kolatkar, P., Oliveira, M. A., Cheng, R. H., Greve, J. M., McClelland, A., Baker, T. S. & Rossmann, M. G. (1993). Structure of a human rhinovirus complexed with its receptor molecule. Proc. Natl Acad. Sci. USA, 90, 507–511.Google Scholar
O'Rourke, J. (1994). Computational geometry in C. Cambridge University Press.Google Scholar
Palmenberg, A. C. (1989). Sequence alignments of picornaviral capsid proteins. In Molecular aspects of picornavirus infection and detection, edited by B. L. Semler & E. Ehrenfeld, pp. 211–241. Washington DC: American Society for Microbiology.Google Scholar
Pattabiraman, N., Ward, K. B. & Fleming, P. J. (1995). Occluded molecular surface: analysis of protein packing. J. Mol. Recognit. 8, 334–344.Google Scholar
Pauling, L. (1960). The nature of the chemical bond, 3rd ed. Ithaca: Cornell University Press. Google Scholar
Peters, K. P., Fauck, J. & Frommel, C. (1996). The automatic search for ligand binding sites in proteins of known three-dimensional structure using only geometric criteria. J. Mol. Biol. 256, 201–213.Google Scholar
Petitjean, M. (1994). On the analytical calculation of van der Waals surfaces and volumes: some numerical aspects. J. Comput. Chem. 15, 1–10.Google Scholar
Pontius, J., Richelle, J. & Wodak, S. J. (1996). Deviations from standard atomic volumes as a quality measure for protein crystal structures. J. Mol. Biol. 264, 121–136.Google Scholar
Procacci, P. & Scateni, R. (1992). A general algorithm for computing Voronoi volumes: application to the hydrated crystal of myoglobin. Int. J. Quant. Chem. 42, 151–152.Google Scholar
Rashin, A. A., Iofin, M. & Honig, B. (1986). Internal cavities and buried waters in globular proteins. Biochemistry, 25, 3619–3625.Google Scholar
Reynolds, J. A., Gilbert, D. B. & Tanford, C. (1974). Empirical correlation between hydrophobic free energy and aqueous cavity surface area. Proc. Natl Acad. Sci. USA, 71, 2925–2927.Google Scholar
Richards, F. M. (1974). The interpretation of protein structures: total volume, group volume distributions and packing density. J. Mol. Biol. 82, 1–14.Google Scholar
Richards, F. M. (1977). Areas, volumes, packing, and protein structure. Annu. Rev. Biophys. Bioeng. 6, 151–176.Google Scholar
Richards, F. M. (1979). Packing defects, cavities, volume fluctuations, and access to the interior of proteins. Including some general comments on surface area and protein structure. Carlsberg Res. Commun. 44, 47–63.Google Scholar
Richards, F. M. (1985). Calculation of molecular volumes and areas for structures of known geometry. Methods Enzymol. 115, 440–464.Google Scholar
Richards, F. M. & Lim, W. A. (1994). An analysis of packing in the protein folding problem. Q. Rev. Biophys. 26, 423–498.Google Scholar
Richmond, T. J. (1984). Solvent accessible surface area and excluded volume in proteins: analytical equations for overlapping spheres and implications for the hydrophobic effect. J. Mol. Biol. 178, 63–89.Google Scholar
Richmond, T. J. & Richards, F. M. (1978). Packing of alpha-helices: geometrical constraints and contact areas. J. Mol. Biol. 119, 537–555.Google Scholar
Rossmann, M. G. (1989). The canyon hypothesis. J. Biol. Chem. 264, 14587–14590.Google Scholar
Rossmann, M. G. & Palmenberg, A. C. (1988). Conservation of the putative receptor attachment site in picornaviruses. Virology, 164, 373–382.Google Scholar
Rowland, R. S. & Taylor, R. (1996). Intermolecular nonbonded contact distances in organic crystal structures: comparison with distances expected from van der Waals radii. J. Phys. Chem. 100, 7384–7391.Google Scholar
Sgro, J.-Y. (1996). Virus visualization. In Encyclopedia of virology plus (CD-ROM version), edited by R. G. Webster & A. Granoff. San Diego: Academic Press.Google Scholar
Sharp, K. A., Nicholls, A., Fine, R. F. & Honig, B. (1991). Reconciling the magnitude of the microscopic and macroscopic hydrophobic effects. Science, 252, 107–109.Google Scholar
Sherry, B., Mosser, A. G., Colonno, R. J. & Rueckert, R. R. (1986). Use of monoclonal antibodies to identify four neutralization immunogens on a common cold picornavirus, human rhinovirus 14. J. Virol. 57, 246–257.Google Scholar
Sherry, B. & Rueckert, R. (1985). Evidence for at least two dominant neutralization antigens on human rhinovirus 14. J. Virol. 53, 137–143.Google Scholar
Shrake, A. & Rupley, J. A. (1973). Environment and exposure to solvent of protein atoms. Lysozyme and insulin. J. Mol. Biol. 79, 351–371.Google Scholar
Sibbald, P. R. & Argos, P. (1990). Weighting aligned protein or nucleic acid sequences to correct for unequal representation. J. Mol. Biol. 216, 813–818.Google Scholar
Singh, R. K., Tropsha, A. & Vaisman, I. I. (1996). Delaunay tessellation of proteins: four body nearest-neighbor propensities of amino acid residues. J. Comput. Biol. 3, 213–222.Google Scholar
Sreenivasan, U. & Axelsen, P. H. (1992). Buried water in homologous serine proteases. Biochemistry, 31, 12785–12791.Google Scholar
Tanford, C. (1997). How protein chemists learned about the hydrophobicity factor. Protein Sci. 6, 1358–1366.Google Scholar
Tanford, C. H. (1979). Interfacial free energy and the hydrophobic effect. Proc. Natl Acad. Sci. USA, 76, 4175–4176.Google Scholar
Ten Eyck, L. F. (1977). Efficient structure-factor calculation for large molecules by the fast Fourier transform. Acta Cryst. A33, 486–492.Google Scholar
Tsai, J., Gerstein, M. & Levitt, M. (1996). Keeping the shape but changing the charges: a simulation study of urea and its isosteric analogues. J. Chem. Phys. 104, 9417–9430.Google Scholar
Tsai, J., Gerstein, M. & Levitt, M. (1997). Estimating the size of the minimal hydrophobic core. Protein Sci. 6, 2606–2616.Google Scholar
Tsai, J., Taylor, R., Chothia, C. & Gerstein, M. (1999). The packing density in proteins: standard radii and volumes. J. Mol. Biol. 290, 253–266.Google Scholar
Tsai, J., Voss, N. & Gerstein, M. (2001). Voronoi calculations of protein volumes: sensitivity analysis and parameter database. Bioinformatics. In the press.Google Scholar
Voronoi, G. F. (1908). Nouvelles applications des paramétres continus à la théorie des formes quadratiques. J. Reine Angew. Math. 134, 198–287.Google Scholar
Williams, M. A., Goodfellow, J. M. & Thornton, J. M. (1994). Buried waters and internal cavities in monomeric proteins. Protein Sci. 3, 1224–1235.Google Scholar
Wodak, S. J. & Janin, J. (1980). Analytical approximation to the accessible surface areas of proteins. Proc. Natl Acad. Sci. USA, 77, 1736–1740.Google Scholar
Xie, Q. & Chapman, M. S. (1996). Canine parvovirus capsid structure, analyzed at 2.9 Å resolution. J. Mol. Biol. 264, 497–520.Google Scholar
Zhou, G., Somasundaram, T., Blanc, E., Parthasarathy, G., Ellington, W. R. & Chapman, M. S. (1998). Transition state structure of arginine kinase: implications for catalysis of bimolecular reactions. Proc. Natl Acad. Sci. USA, 95, 8449–8454.Google Scholar