Noncrystallographic symmetry averaging of electron density for molecular-replacement phase refinement and extension

Rossmann, M. G.; Arnold, E.

doi:10.1107/97809553602060000684

International
Tables for
Crystallography
Volume F
Crystallography of biological macromolecules
Edited by M. G. Rossmann and E. Arnold

pdf | chapter contents | chapter index | related articles

International Tables for Crystallography (2006). Vol. F. ch. 13.4, pp. 279-292 | 1 | 2 |
https://doi.org/10.1107/97809553602060000684

Chapter 13.4. Noncrystallographic symmetry averaging of electron density for molecular-replacement phase refinement and extension

M. G. Rossmann^a ^* and E. Arnold^b

^aDepartment of Biological Sciences, Purdue University, West Lafayette, IN 47907-1392, USA, and ^bBiomolecular Crystallography Laboratory, CABM & Rutgers University, 679 Hoes Lane, Piscataway, NJ 08854-5638, USA
Correspondence e-mail: mgr@indiana.bio.purdue.edu

Noncrystallographic symmetry (NCS) occurs when symmetry operations are true only within a confined envelope, as opposed to being valid throughout the essentially infinite crystal lattice. Computationally, it is useful to define the molecular symmetry with reference to an arbitrary cell (the `h-cell') with the relationship X_n = [R_n]X₁ (n = 1, N). Then the assembly of N NCS equivalent objects can be moved into the actual crystal (the `p-cell') using the relationship Y = [E]X. Hence each of the N units can be referred to the reference unit by Y_n = [E][R_n]X₁. In turn, the N units in the p-cell asymmetric unit can be multiplied by the crystal symmetry to produce the whole unit cell from the reference subunit in the h-cell. Procedures of averaging electron density will require a definition of the envelope either for the reference subunit or the whole of the molecular assembly if the NCS represents a closed point group (`proper' NCS). Averaging beyond the range of the NCS operators means that averaging is between non-equivalent densities. This causes the mean height of the average density to diminish and thus accurately indicates the limits of the NCS envelope. Various symmetry situations are examined, such as averaging subunits within the same crystal lattice (maybe proper symmetry) or between different crystal forms (necessarily improper symmetry). Phase extension is shown to be possible only by small defined increments of resolution after each cycle of averaging and solvent flattening.

Keywords: ab initio phasing; computer programs; electron-density averaging; h-cell; molecular envelopes; molecular replacement; multidomain averaging; multiple-crystal-form averaging; noncrystallographic symmetry; p-cell; phase extension; phase refinement; phasing.

13.4.1. Introduction

| top | pdf |

Electron-density averaging for phasing crystal structures has become a widespread and nearly routine technique. High noncrystallographic symmetry (NCS) permits the solution of structures using relatively poor and even ab initio phasing starts, while lower NCS electron-density averaging can significantly improve initial phases obtained by techniques such as isomorphous replacement, anomalous scattering, or molecular replacement. Implicit in any averaging is solvent flattening in the regions not available for NCS averaging. Indeed, if all parts of the unit cell were consistent with the NCS, the symmetry would be crystallographic and valid throughout the crystal lattice.

A number of generalized averaging programs and software packages have been developed for macromolecular crystal structure analyses. Ease of use, coupled with relatively convenient definition of molecular envelopes, as well as enormous advances in computer technology, have facilitated the application of symmetry averaging to a diverse set of crystallographic problems. Averaging of separate domains in multidomain protein structures that can be divided into segments and averaging among multiple crystal forms is becoming increasingly common.

Extension of phases to higher resolution by symmetry averaging of electron density, coupled with solvent flattening, has been applied to numerous problems. The power of phase extension has been especially impressive in many cases involving high NCS, such as icosahedral virus structures. The overall power for phase improvement of averaging, combined with other density-modification techniques, such as solvent levelling, has been found to depend upon the degree of NCS, the solvent content of the crystals, and the quality and completeness of experimental data. Similar averaging methodology can be used for structure analysis by other imaging techniques, such as electron microscopy.

This chapter discusses the underlying principles of electron-density averaging for macromolecular crystallographic phase improvement and describes procedures for computer implementation of these techniques.

13.4.2. Noncrystallographic symmetry (NCS)

| top | pdf |

Crystallographic symmetry is valid for the infinite crystal lattice. Any crystallographic symmetry element relates all points within the crystal to equivalent points elsewhere. In contrast, an NCS operator is valid only locally within a finite volume (Fig. 13.4.2.1 ); if a periodic structure is superimposed on itself after operation with an NCS operator, it will superimpose only within the envelope¹ defining the limits of the local symmetry.

Figure 13.4.2.1| top | pdf |

The two-dimensional periodic design shows crystallographic twofold axes perpendicular to the page and local noncrystallographic rotation axes in the plane of the paper (design by Audrey Rossmann). [Reprinted with permission from Rossmann (1972 ). Copyright (1972) Gordon & Breach.]

A product of superimposed periodic structures will be non-periodic, containing only the point symmetry of the noncrystallographic operators (Fig. 13.4.2.2 ). This fact can frequently be used to select a molecular envelope that was not obvious prior to noncrystallographic averaging [see e.g. Buehner et al. (1974 ) or Lin et al. (1986 )]. Although no knowledge of the crystallographic envelope is needed for this first averaging, it is necessary to determine it for the averaged molecular structure within the crystallographic cell to permit Fourier back-transformation.

Figure 13.4.2.2| top | pdf |

(a) NCS in a triclinic cell. (b) Superposition of the pattern in (a) on itself after operation with the noncrystallographic fivefold axis. (c) Superposition of the pattern in (a) on itself after a rotation of one-fifth, two-fifths, three-fifths and four-fifths. Note that the sum or product of periodic patterns is aperiodic and in (c) has the point symmetry of the noncrystallographic operation. [Reprinted with permission from Rossmann (1990 ).]

There must be space between the confining envelopes governed by the local symmetry. Only the crystallographic symmetry is valid within this space. In the limit, when this space has diminished to zero, the local symmetry will have become a true crystallographic operator.

The definition of NCS can be extended to symmetry that relates similar objects in different crystal lattices. An operation that relates an object in one lattice to an equivalent object in another lattice will apply only to the chosen objects in each lattice. Beyond the confines of the chosen objects, there will be no coincidence of pattern.

Two kinds of NCS elements may be defined: proper and improper . The former satisfies a closed point group [e.g. a 17-fold rotation as occurs in tobacco mosaic virus disk protein (Champness et al., 1976 )]. Here, it does not matter whether a rotation axis is applied right- or left-handedly; the result is indistinguishable. On the other hand, the relationship between different molecules in a crystallographic asymmetric unit is unlikely to be a closed point group. Thus, a rotation in one direction (followed by a translation) might achieve superposition of the two molecules, while a rotation in the opposite direction would not. This is called an improper NCS operator . An operation which takes a molecule in one unit cell to that in another unit cell (initially, the cells are lined up with, say, their orthogonalized a, b and c axes parallel) must equally be an improper rotation.

The position in space of a noncrystallographic rotation symmetry operator can be arbitrarily assigned. The rotation operation will orient the two molecules similarly. A subsequent translation, whose magnitude depends upon the location of the NCS operator, will always be able to superimpose the molecules (Fig. 13.4.2.3 ). Nevertheless, it is possible to select the position of the NCS axis such that the translation is a minimum, and that will occur when the translation is entirely parallel to the noncrystallographic rotation axis.

Figure 13.4.2.3| top | pdf |

The position of the twofold rotation axis which relates the two piglets is completely arbitrary. The diagram on the left shows the situation when the translation is parallel to the rotation axis. The diagram on the right has an additional component of translation perpendicular to the rotation axis, but the component parallel to the axis remains unchanged. [Reprinted with permission from Rossmann et al. (1964 ).]

The position of an NCS axis, like everything else in the unit cell, must be defined with respect to a selected origin. Consider the noncrystallographic rotation defined by the $[3 \times 3]$ matrix [C]. Then, if the point x is rotated to x′ (both defined with respect to the selected origin and axial system), $[{\bf x}' = [\hbox{C}]{\bf x} + {\bf d},]$ where d is a three-dimensional vector which expresses the translational component of the NCS operation. The magnitude of the components of d is quite arbitrary unless the position of the rotation axis is defined. If the rotation axis represents a proper NCS element, there will exist a point x on the rotation axis, when positioned to eliminate translation, such that it is rotated onto x′. It follows that for such a point $[{\bf x} = [\hbox{C}]{\bf x} + {\bf d},]$ from which d can be determined if the position of the molecular centre is known. Note that $[{\bf d} = 0]$ if, and only if, the noncrystallographic rotation axis passes through the crystallographic origin.

The presence of proper NCS in a crystal can help phase determination considerably. Consider, for example, a tetramer with 222 symmetry. It is not necessary to define the chemical limits of any one polypeptide chain as the NCS is true everywhere within the molecular envelope and the boundaries of the polypeptide chain are irrelevant to the geometrical considerations. The electron density at every point within the molecular envelope (which itself must have 222 symmetry) can be averaged among all four 222-related points without any chemical knowledge of the configuration of the monomer polypeptide. On the other hand, if there is only improper NCS, then the envelope must define the limits of one noncrystallographic asymmetric unit, although the crystallographic asymmetric unit contains two or more such units.

13.4.3. Phase determination using NCS

| top | pdf |

The molecular replacement method [cf. Rossmann & Blow (1962 ); Rossmann (1972, 1990 ); Argos & Rossmann (1980 ); Rossmann & Arnold (2001 )] is dependent upon the presence of NCS, whether it relates objects within one crystal lattice or between crystal lattices. The NCS rotational relationship in real space is exactly mimicked in reciprocal space. Local symmetry in real space has the equivalent effect of rotating a reciprocal lattice onto itself or another (with origins coincident), such that the integral reciprocal-lattice points of one reciprocal space coincide with non-integral reciprocal-lattice positions in the other. As the reciprocal lattice samples the Fourier transform of a molecule only at finite and integral reciprocal-lattice points, the effect of an NCS operation is to permit sampling of the molecular transform at intermediate non-integral reciprocal-lattice positions. If such sampling occurs frequently enough, it will constitute a plot of the continuous transform of the molecule and, hence, amount to a structure determination.

Whenever a molecule exists more than once either in the same unit cell or in different unit cells, then error in the molecular electron-density distribution due to error in phasing can be reduced by averaging the various molecular copies. The number of such copies, N, is referred to as the noncrystallographic redundancy . As the NCS is, by definition, only local (often pertaining to a particular molecular centre), there are holes and gaps between the averaged density, which presumably are solvent space between molecules. Thus, the electron density can be improved both by averaging electron density and by setting the density between molecules to a low, constant value (`solvent flattening'). Phases calculated by Fourier back-transforming the improved density should be more accurate than the original phases. Hence, the observed structure amplitudes (suitably weighted) can be associated with the improved phases, and a new and improved map can be calculated. This, in turn, can again be averaged until convergence has been reached and the phases no longer change. In addition, the back-transformed map can be used to compute phases just beyond the extremity of the resolution of the terms used in the original map. The resultant amplitudes will not be zero because the map had been modified by averaging and solvent flattening. Thus, phases can be gradually extended and improved, starting from a very low resolution approximation to the molecular structure. This procedure was first implemented in reciprocal space (Rossmann & Blow, 1963 ; Main, 1967 ; Crowther, 1969 ) and then, more recently, in real space (Bricogne, 1974, 1976 ; Johnson, 1978 ; Jones, 1992 ; Rossmann et al., 1992 ). More recently still, there has been an attempt to reproduce the very successful real-space procedure in reciprocal space (Tong & Rossmann, 1995 ).

Early examples of such a procedure for phase improvement are the structure determinations of deoxyhaemoglobin (Muirhead et al., 1967 ), α-chymotrypsin (Matthews et al., 1967 ), lobster glyceraldehyde-3-phosphate dehydrogenase (Buehner et al., 1974 ), hexokinase (Fletterick & Steitz, 1976 ), tobacco mosaic virus disk protein (Champness et al., 1976 ; Bloomer et al., 1978 ), the influenza virus haemagglutinin spike (Wilson et al., 1981 ), tomato bushy stunt virus (Harrison et al., 1978 ) and southern bean mosaic virus (Abad-Zapatero et al., 1980 ). Early examples of phase extension, using real-space electron-density averaging, were the study of glyceraldehyde-3-phosphate dehydrogenase (Argos et al., 1975 ), satellite tobacco necrosis virus (Nordman, 1980 ), haemocyanin (Gaykema et al., 1984 ), human rhinovirus 14 (Rossmann et al., 1985 ) and poliovirus (Hogle et al., 1985 ). Since then, this method has been used in numerous virus structure determinations, with the phase extension being initiated from ever lower resolution.

A once-popular computer program for real-space averaging was written by Gerard Bricogne (1976 ). Another program has been described by Johnson (1978 ). Both programs were based on a double-sorting procedure. Bricogne (1976 ) had suggested that, with interpolation between grid points using linear polynomials, it was necessary to sample electron density at grid intervals finer than one-sixth of the resolution limit of the Fourier terms that were used in calculating the map. With the availability of more computer memory, it was possible to store much of the electron density, thus avoiding time-consuming sorting operations (Hogle et al., 1985 ; Luo et al., 1989 ). Simultaneously, the storage requirements could be drastically reduced by using interpolation with quadratic polynomials. While the latter required a little extra computation time, this was far less than what would have been needed for sorting. Furthermore, it was found that Bricogne's estimate for the fineness of the map storage grid was too pessimistic, even for linear interpolation, which works well to about 1/2.5 of the resolution limit of the map.

In addition to changes in strategy brought about by computers with much larger memories, experience has been gained in program requirements for real-space averaging for phase determination (Dodson et al., 1992 ). Here we give a general procedure for electron-density averaging.

13.4.4. The p- and h-cells

| top | pdf |

It is useful to define two types of unit cells.

(1) The `p-cell' is the unit cell of the unknown crystal structure and is associated with fractional coordinates y and unit-cell vectors $[{\bf a}_{p}, {\bf b}_{p}, {\bf c}_{p}]$ .
(2) The `h-cell' is the unit cell with respect to which the noncrystallographic axes of the molecule (or particle) are to be defined in a standard orientation and is associated with fractional coordinates x and unit-cell vectors $[{\bf a}_{h}, {\bf b}_{h}, {\bf c}_{h}]$ .

Since the averaged molecule is to be placed into all crystallographically related positions in the p-cell, it is essential to know the envelope that encloses a single molecule. Care must be taken that the envelopes from neighbouring molecules in the p-cell do not overlap. The remaining space between the limits of the envelopes of the variously placed molecules in the p-cell can be taken to be solvent and, hence, flattened, a useful physical assumption for helping phase determination.

The h-cell must be chosen to be at least as large as the largest dimension of the molecule. In general, it is convenient to define the h-cell with $[{a}_{h} = {b}_{h} = {c}_{h}]$ and $[\alpha = \beta = \gamma = 90^{\circ}]$ , while placing the molecular centre at $[({1 \over 2}, {1\over 2}, {1\over 2})]$ . For example, if the molecule is a viral particle with icosahedral symmetry, the standard orientation can be defined by placing the twofold axes to correspond to the h-cell unit-cell axes, a procedure which can be done in one of two ways (Fig. 13.4.4.1 ). It will be necessary to know how the molecule (or particle) in the h-cell is related to the `reference' molecule in the p-cell. The known p-cell crystallographic symmetry then permits the complete construction of the p-cell structure from whatever is the current h-cell electron-density representation of the molecule.

Figure 13.4.4.1| top | pdf |

Stereographic projections showing alternative definitions of the `standard orientation' of an icosahedron in the h-cell. Icosahedral axes are placed parallel to the cell axes. Limits of a noncrystallographic asymmetric unit are shaded, representing 1/60th of the volume of an object with icosahedral symmetry. [Reproduced with permission from Rossmann et al. (1992 ).]

The h-cell is used to represent the density of a molecule in the standard orientation obtained by averaging all the noncrystallographic units in the p-cell. While density within a specific molecule will tend to be reinforced by the averaging procedure, the density outside the molecular boundaries will tend to be diminished. Thus, by averaging into the h-cell, the molecular envelope is revealed automatically. Indeed, the greater the NCS, the greater the clarity of the molecular boundary. Hence, the averaged molecule in the h-cell can be used to define a molecular mask in the p-cell automatically.

Averaging into the h-cell is also useful for displaying the molecule in a standard orientation (i.e. obtaining the electron-density distribution on skew planes). Thus, it is possible to display the molecule, for instance, with sections perpendicular to a molecular twofold axis, and to position the molecular symmetry axes accurately. From this, it is then easy to define the limits of the molecular asymmetric unit (Fig. 13.4.4.1 ). Hence, it is possible to save a great deal of computing time by evaluating the electron density in the h-cell only at those grid points within and immediately surrounding the noncrystallographic asymmetric unit.

13.4.5. Combining crystallographic and noncrystallographic symmetry

| top | pdf |

Transformations will now be described which relate noncrystallographically related positions distributed among several fragmented copies of the molecule in the asymmetric unit of the p-cell and between the p-cell and the h-cell.

13.4.5.1. General considerations

| top | pdf |

Let Y and X be position vectors in a Cartesian coordinate system whose components have dimensions of length, in the p- and h-cells, which utilize the same origin as the fractional coordinates, y and x, respectively. Let $[[\beta_{p}]]$ and $[[\alpha_{h}]]$ be `orthogonalization' and `de-orthogonalization' matrices in the p- and h-cells, respectively (Rossmann & Blow, 1962 ). Then $[\eqalign{{\bf Y} &= [\beta_{p}]{\bf y} \qquad \quad \hbox{and}\qquad \quad {\bf x} = [\alpha_{h}]{\bf X},\cr [\alpha_{p}] &= [\beta_{p}]^{-1}\quad \quad \;\hbox{ and}\quad \quad [\alpha_{h}] = [\beta_{h}]^{-1}.} \eqno(13.4.5.1)]$ Thus, for instance, $[[\alpha_{h}]]$ denotes a matrix that transforms a Cartesian set of unit vectors to fractional distances along the unit-cell vectors $[{\bf a}_{h}, {\bf b}_{h}, {\bf c}_{h}]$ .

Let the Cartesian coordinates Y and X be related by the rotation matrix [ω] and the translation vector D such that $[{\bf X} = [\omega]{\bf Y} + {\bf D}. \eqno(13.4.5.2)]$ If the molecules are to be averaged among different unit cells, then each p-cell must be related to the standard h-cell orientation by a different [ω] and D. Then, from (13.4.5.1 ) and (13.4.5.2 ) $[{\bf X} = [\omega][\beta_{p}]{\bf y} + {\bf D}. \eqno(13.4.5.3)]$

Now, if [ω] represents the rotational relationship between the `reference' molecule, [m = 1] , in the p-cell with respect to the h-cell, then from (13.4.5.3 ) $[{\bf X} = [\omega][\beta_{p}]{\bf y}_{m = 1} + {\bf D},]$ where $[{\bf y}_{m}]$ refers to the fractional coordinates of the mth molecule in the p-cell.

Assuming there is only one molecule per asymmetric unit in the p-cell, let the mth molecule in the p-cell be related to the reference molecule by the crystallographic rotation $[[\hbox{T}_{m}]]$ and translational operators $[{\bf t}_{m}]$ , such that $[{\bf y}_{m} = [\hbox{T}_{m}]{\bf y}_{m=1} + {\bf t}_{m}. \eqno(13.4.5.4)]$ For convenience, all translational components will initially be neglected in the further derivations below, but they will be reintroduced in the final stages. Hence, from (13.4.5.3 ) and (13.4.5.4 ) $[{\bf X} = \{[\omega][\beta_{p}][\hbox{T}_{m}^{-1}]\}{\bf y}_{m}. \eqno(13.4.5.5)]$ Further, if $[{\bf X}_{n}]$ refers to the nth subunit within the molecule in the h-cell, and similarly if $[{\bf y}_{m,\, n}]$ refers to the nth subunit within the mth molecule of the p-cell, then from (13.4.5.5 ) $[{\bf X}_{n} = \{[\omega][\beta_{p}][\hbox{T}_{m}^{-1}]\}{\bf y}_{m,\, n}. \eqno(13.4.5.6)]$ Finally, the rotation matrix $[[\hbox{R}_{n}]]$ is used to define the relationship among the N ( [N = 2] for a dimer, 4 for a 222 tetramer, 60 for an icosahedral virus etc.) noncrystallographic asymmetric units of the molecule within the h-cell. Then $[{\bf X}_{n} = [\hbox{R}_{n}]{\bf X}_{n=1}. \eqno(13.4.5.7)]$

13.4.5.2. Averaging with the p-cell

| top | pdf |

Consider averaging the density at N noncrystallographically related points in the p-cell and replacing that density into the p-cell. By substituting for $[{\bf X}_{n}]$ and $[{\bf X}_{n=1}]$ in (13.4.5.7 ) and using (13.4.5.6 ), $[\{[\omega][\beta_{p}][\hbox{T}_{m}^{-1}]\}{\bf y}_{m,\, n} = [\hbox{R}_{n}]\{[\omega][\beta_{p}][\hbox{T}_{m}^{-1}]\}{\bf y}_{m,\, n=1},]$ or $[{\bf y}_{m,\, n} = \{[\omega][\beta_{p}][\hbox{T}_{m}^{-1}]\}^{-1} \times [\hbox{R}_{n}]\{[\omega][\beta_{p}][\hbox{T}_{m}^{-1}]\}{\bf y}_{m,\, n=1}. \eqno(13.4.5.8)]$ Now set $[\eqalignno{[{\rm E}_{m,\, n}] &= \{[\omega][\beta_{p}][\hbox{T}_{m}^{-1}]\}^{-1}[\hbox{R}_{n}]\{[\omega][\beta_{p}][\hbox{T}_{m}^{-1}]\}\cr &= [\hbox{T}_{m}][\alpha_{p}][\omega^{-1}][\hbox{R}_{n}][\omega][\beta_{p}][\hbox{T}_{m}^{-1}], &(13.4.5.9)}]$ giving $[{\bf y}_{m,\, n} = [\hbox{E}_{m,\, n}]{\bf y}_{m,\, n=1} + {\bf e}_{m,\, n}, \eqno(13.4.5.10)]$ where $[{\bf e}_{m,\, n}]$ is the corresponding translational element. Note that multiplication by $[[\hbox{E}_{m,\, n}]]$ thus corresponds to the following sequence of transformations: (1) placing all the crystallographically related subunits into the reference orientation with $[[\hbox{T}_{m}^{-1}]]$ ; (2) `orthogonalizing' the coordinates with $[[\beta_{p}]]$ ; (3) rotating the coordinates into the h-cell with [ω]; (4) rotating from the reference subunit of the molecule of the h-cell with $[[\hbox{R}_{n}]]$ ; (5) rotating these back into the p-cell with $[[\omega^{-1}]]$ ; (6) `de-orthogonalizing' in the p-cell with $[[\alpha_{p}]]$ ; and (7) placing these back into each of the M crystallographic asymmetric units of the p-cell with $[[\hbox{T}_{m}]]$ .

The translational elements, $[{\bf e}_{m,\, n}]$ , can now be evaluated. Let $[{\bf s}_{p,\, m}]$ be the fractional coordinates of the centre (or some arbitrary position) of the mth molecule in the p-cell; hence, $[{\bf s}_{p,\, m=1}]$ denotes the molecular centre position of the reference molecule in the p-cell. If $[{\bf s}_{p,\, m}]$ is at the intersection of the molecular rotation axes, then it will be the same for all n molecular asymmetric units. Therefore, it follows from (13.4.5.10 ) that $[{\bf e}_{m,\, n} = {\bf s}_{p,\, m} - [\hbox{E}_{m,\, n}]{\bf s}_{p,\, m=1}, \eqno(13.4.5.11a)]$ or $[{\bf y}_{m,\, n} = [\hbox{E}_{m,\, n}]{\bf y}_{m,\, n=1} + ({\bf s}_{p,\, m} - [\hbox{E}_{m,\, n}]{\bf s}_{p,\, m=1}). \eqno(13.4.5.11b)]$ Equation (13.4.5.11b) can be used to find all the N noncrystallographic asymmetric units within the crystallographic asymmetric unit of the p-cell. Thus, this is the essential equation for averaging the density in the p-cell and replacing it into the p-cell.

13.4.5.3. Averaging the p-cell and placing the results into the h-cell

| top | pdf |

Consider averaging the density at N noncrystallographically related points in the p-cell and placing that result into the h-cell. From (13.4.5.7 ), multiplying by $[[\alpha_{h}]]$ , $[[\alpha_{h}] [{\bf X}_{n=1}] = [\alpha_{h}][\hbox{R}_{n}^{-1}]{\bf X}_{n}.]$ From (13.4.5.1 ) and (13.4.5.6 ), $[{\bf x}_{n=1} = [\alpha_{h}][\hbox{R}_{n}^{-1}]\{[\omega][\beta_{p}][\hbox{T}_{m}^{-1}]\}{\bf y}_{m,\, n}. \eqno(13.4.5.12)]$ Since it is only necessary to place the reference molecule of the p-cell into the h-cell, it is sufficient to consider the case when [m = 1] , in which case $[[\hbox{T}_{m}^{-1}]]$ is the identity matrix [I]. It then follows, by inversion, that $[\eqalign{{\bf y}_{m=1,\, n} &= \{[\omega][\beta_{p}]\}^{-1}[\hbox{R}_{n}][\beta_{h}]{\bf x}_{n=1}\cr &= [\alpha_{p}][\omega^{-1}][\hbox{R}_{n}][\beta_{h}]{\bf x}_{n=1},}]$ which corresponds to: (1) `orthogonalizing' the h-cell fractional coordinates with $[[\beta_{h}]]$ ; (2) rotating into the nth noncrystallographic unit within the molecule using $[[\hbox{R}_{n}]]$ ; (3) rotating into the p-cell with $[[\omega^{-1}]]$ ; and (4) `de-orthogonalizing' into fractional p-cell coordinates with $[[\alpha_{p}]]$ .

Now, if $[{\bf s}_{h}]$ is the molecular centre in the h-cell (usually $[{1 \over 2}, {1 \over 2}, {1 \over 2}]$ ), then $[{\bf y}_{m=1,\, n} = [\hbox{E}_{m=1,\, n}']{\bf x} + ({\bf s}_{p,\, m=1} - [\hbox{E}_{m=1,\, n}']{\bf s}_{h}),]$ and $[[\hbox{E}_{m=1,\, n}'] = [\alpha_{p}][\omega^{-1}][\hbox{R}_{n}][\beta_{h}]. \eqno(13.4.5.13)]$ Equation (13.4.5.13 ) determines the position of the N noncrystallographically related points $[{\bf y}_{m=1,\, n}]$ in the p-cell whose average value is to be placed at x in the h-cell.

13.4.6. Determining the molecular envelope

| top | pdf |

Various techniques are available for determining the molecular envelope within which density can be averaged and outside of which the solvent can be flattened.

(1) By assumption of a simple geometric shape, such as a sphere. This is frequently used for icosahedral viruses.
(2) By manual inspection of a poor electron-density map which, nevertheless, gives some guidance as to the molecular boundaries. A variety of interactive graphical programs are available to help define the molecular boundary.
(3) By use of a homologous structure or other information, such as a cryo-electron-microscopy (cryo-EM) reconstruction at low resolution. The information about a homologous structure may be either in the form of an electron-density grid or, often more conveniently, as an atomic model.
(4) By inspection of an averaged map which should have weaker density beyond the limits of the molecular boundary where the NCS is no longer true.

Procedures (2) and (3) are advisable when the NCS redundancy is low. Procedure (4) works well when the NCS redundancy is four or higher. The crystallographic asymmetric unit is likely to contain bits and pieces of molecules centred at various positions in the unit cell and neighbouring unit cells. Therefore, it is necessary to associate each grid point within the p-cell crystallographic asymmetric unit to a specific molecular centre or to solvent.

If the molecular-boundary assignments are to be made automatically, then the following procedure can be used. The number, M, of such molecules can be estimated by generating all centres, derived from the given position of the centre for the reference molecule, $[{\bf s}_{p,\, n=1}]$ , and then determining whether a molecule of radius $[{R}_{\rm out}]$ would impinge on the crystallographic asymmetric unit within the defined boundaries. Here, $[{R}_{\rm out}]$ is a liberal estimate of the molecular radius. The corresponding rotation matrices $[[\hbox{E}_{m,\, n}]]$ and translation vectors $[{\bf e}_{m,\, n}]$ can then be computed from (13.4.5.9 ) and (13.4.5.11a ).

Any grid point whose distance from all M centres is greater than $[{R}_{\rm out}]$ can immediately be designated as being in the solvent region. For other grid points, it is necessary to examine the corresponding h-cell density. From (13.4.5.12 ), it follows that (setting [n = 1] ) $[{\bf x} = [\hbox{E}_{m,\, n=1}'']{\bf y}_{m} + ({\bf s}_{h} - [\hbox{E}_{m,\, n=1}'']{\bf s}_{p,\, m}),]$ where $[[\hbox{E}_{m,\, n=1}''] = [\alpha_{h}][\hbox{R}_{n}^{-1}][\omega][\beta_{p}][\hbox{T}_{m}^{-1}] \eqno(13.4.6.1)]$ (n can be set to 1, since the h-cell presumably contains an averaged molecular electron density, in which case it does not matter which molecular asymmetric unit is referenced). Thus, (13.4.6.1 ) can be used to determine the electron density at $[{\bf y}_{m}]$ by inspecting the corresponding interpolated density, $[\rho ({\bf x})]$ , at x in the h-cell. Transfer of the electron density, $[\rho ({\bf x})]$ , from the h-cell to the p-cell using (13.4.6.1 ) is often useful to obtain an initial structure. However, to determine a suitable mask, it is useful to evaluate a modified electron density, $[\langle\rho ({\bf x})\rangle]$ , (see below) for the grid points immediately around x in the h-cell.

A variable parameter `CRIT' can be specified to establish the distribution of grid points that are within the molecular envelope. When the modified electron density, $[\langle\rho ({\bf x})\rangle]$ , is less than CRIT, the corresponding grid point at y is assumed to be in solvent. Otherwise, when $[\langle\rho ({\bf x})\rangle]$ exceeds CRIT, the grid point at y is assigned to that molecule which has the largest $[\langle\rho ({\bf x})\rangle]$ . If the percentage of grid points which might be assigned to more than one molecule is large (say, greater than 1% of the total number of grid points), it probably signifies that the value of CRIT is too low, that the molecular boundary is far from clear, or that the function used to define $[\langle\rho ({\bf x})\rangle]$ was badly chosen (Fig. 13.4.6.1 ). Grid points outside the molecular envelope can be set to the average solvent density.

Figure 13.4.6.1| top | pdf |

The volume of the molecular mask expressed as a percentage of the volume of the p-cell asymmetric unit, as determined by the density cutoff in the h-cell. When the modulus of the density cutoff is decreased to less than the mean smeared electron density within the protein, the mask volume increases rapidly. Intersection of the tangents suggests the most appropriate density cutoff value for mask generation. [Reproduced with permission from McKenna, Xia, Willingmann, Ilag & Rossmann (1992 ).]

An essential criterion for the molecular envelope is that it obeys the noncrystallographic point-group symmetry. If the original h-cell electron density already possesses the molecular symmetry (e.g. icosahedral 532, 222 etc.), then the p-cell mask should also have that symmetry. However, if the mask boundaries were chosen manually, masks from different molecular centres might be in conflict and have local errors in the correct molecular symmetry. Such errors can be corrected by reimposing the noncrystallographic point-group symmetry on the p-cell mask. This can be conveniently achieved by setting the density at each grid point that was considered within the molecular envelope to a value of 100, and all other grid points to a density of zero. If the resultant density is averaged using the same routine as is used for averaging the actual electron density of the molecule, then the average density will remain 100 if the interpolated density is 100 at all noncrystallographically related points. However, if the original grid point is near the edge of the mask, finding the density at symmetry-related points may involve interpolation between density at level 100 and at level 0, giving an averaged density of less than 100. Hence, any grid point whose averaged density is below some criterion should be attributed to solvent.

Other improvements to mask generation were discussed by Rossmann et al. (1992 ). In any event, the molecular-envelope definition should be periodically re-examined after a suitable number of electron-density-averaging cycles.

13.4.7. Finding the averaged density

| top | pdf |

Electron density can be averaged (1) among the N NCS-related molecules in the p-cell (the real crystal unit cell), thus creating a new and improved map of the p-cell; (2) among the N NCS-related molecules in the p-cell and placing the results into a standard orientation in the h-cell; or (3) among the N NCS-related molecules in different unit cells and placing the results back into the original different unit cells or into a standard h-cell. Before averaging commences, the $[M \times N]$ matrices $[[\hbox{E}_{m,\, n}]]$ and translation vectors $[{\bf e}_{m,\, n}]$ must be evaluated [see (13.4.5.9 ) and (13.4.5.11a )]. Here, N is the noncrystallographic redundancy and M is the number of molecules that impinge on the crystallographic asymmetric unit of the p-cell. Associated with each grid point in the p-cell asymmetric unit will be (1) the value of m designating which molecular centre is to be associated with that grid point (a special value of m is for solvent) and (2) the p-cell electron density at that point.

The grid points within the asymmetric unit are then examined one at a time. If the grid point is within the mask, it is averaged among the N noncrystallographically related equivalent positions belonging to molecule m. If the grid point is solvent, the density can be set to the average solvent density.

The N noncrystallographically equivalent non-integral grid points can be computed from (13.4.5.11a ). Some of these will lie outside the crystallographic asymmetric unit. These will, therefore, have to be operated on by unit-cell translations and crystallographic symmetry operations to bring them back into the asymmetric unit before the corresponding interpolated density can be calculated.

Averaging into the h-cell can be done by a procedure similar to averaging in the p-cell, except that the rotation and translation matrices are given by (13.4.5.13 ). Furthermore, no mask is required as all the averaging into the h-cell (from p-cell electron density) can be done with respect to the reference molecule centred at $[{\bf s}_{p,\, m = 1}]$ in the p-cell. Each grid point is taken in turn in the h-cell. The electron density at any grid point that is further away from $[{\bf s}_{\rm h}]$ than from $[R_{\rm out}]$ is set to zero. Other grid-point positions are expanded into the N equivalent positions in the p-cell surrounding $[{\bf s}_{p,\, m = 1}]$ . The interpolated density is then found, averaged over the N equivalent positions, and stored at the original h-cell grid point in successive sections, in the same way as in the p-cell averaging. As in averaging within the p-cell, a record is kept of $[\langle\sigma(\rho)\rangle]$ as a function of $[\langle\rho({\bf x})\rangle]$ (Table 13.4.7.1 ). In general, the local NCS is valid only within the molecule. Hence, the h-cell density will show the molecular envelope and can be used to recompute an improved p-cell density mask. The rate of build up of signal within the molecule should be roughly proportional to N, while the rate outside the molecule should be proportional to about $[N^{1/2}]$ .

Table 13.4.7.1| top | pdf |
Mean root-mean-square scatter between noncrystallographically related points

Example taken from φX174 structure determination. $[\langle\rho_{8}\rangle]$ is proportional to the mean density (e Å⁻³) based on eight-point interpolation; n is number of grid points with $[\langle\rho_{8}\rangle]$ in a given range; $[\langle\sigma (\rho_{8})\rangle]$ is the root-mean-square deviation from $[\rho_{8}]$ among noncrystallographic asymmetric points averaged over all points in the mask.

$[\langle\rho_{8}\rangle]$	Density derived from an electron microscopy image at 25 Å resolution		Density derived from a 3.3 Å crystal structure
$[\langle\rho_{8}\rangle]$	n	$[\langle\sigma (\rho_{8})\rangle]$	n	$[\langle\sigma (\rho_{8})\rangle]$
−375 to −325	1	44.7	0	0.0
−325 to −275	16	44.4	0	0.0
−275 to −225	22	39.5	41	31.4
−225 to −175	81	34.9	3493	25.5
−175 to −125	299	34.7	65049	20.5
−125 to −75	1119	33.1	290025	17.7
−75 to −25	16617	34.7	661386	15.0
−25 to 25	33818	46.9	1,016274	12.8
25 to 75	6008	31.9	344620	16.3
75 to 125	4512	32.0	215036	18.9
125 to 175	3050	32.1	146690	22.1
175 to 225	1562	32.6	58155	26.3
225 to 275	542	33.4	6032	32.2
275 to 325	213	35.6	227	40.6
325 to 375	33	34.7	9	46.8

13.4.8. Interpolation

| top | pdf |

Some thought must go into defining the size of the grid interval. Shannon's sampling theorem shows that the grid interval must never be greater than half the limiting resolution of the data. Thus, for instance, if the limiting resolution is 3 Å, the grid intervals must be smaller than 1.5 Å. Clearly, the finer the grid interval, the more accurate the interpolated density, but the computing time will increase with the inverse cube of the size of the grid step. Similarly, if the grid interval is fine, less care and fewer points can be used for interpolation, thus balancing the effect of the finer grid in terms of computing time. In practice, it has been found that an eight-point interpolation (as described below) can be used, provided the grid interval is less than 1/2.5 of the resolution (Rossmann et al., 1992 ). Other interpolation schemes have also been used (e.g. Bricogne, 1976 ; Nordman, 1980 ; Hogle et al., 1985 ; Bolin et al., 1993 ).

A straightforward `linear' interpolation can be discussed with reference to Fig. 13.4.8.1 (in mathematical literature, this is called a trilinear approximation or a tensor product of three one-dimensional linear interpolants). Let G be the position at which the density is to be interpolated, and let this point have the fractional grid coordinates Δx, Δy, Δz within the box of surrounding grid points. Let 000 be the point at $[\Delta x = 0,]$ $[\Delta y = 0]$ , $[\Delta z = 0]$ . Other grid points will then be at 100, 010, 001 etc., with the point diagonally opposite the origin at 111.

Figure 13.4.8.1| top | pdf |

Interpolation box for finding the approximate electron density at G(Δx, Δy, Δz), given the eight densities at the corners of the box. The interpolated value can be built up by first using interpolations to determine the densities at A, B, C and D. A second linear interpolation then determines the density at E (from densities at A and B) and at F (from densities at C and D). The third linear interpolation determines the density at G from the densities at E and F. [Reproduced with permission from Rossmann et al. (1992 ).]

The density at A (between 000 and 100) can then be approximated as the value of the linear interpolant of $[\rho_{000}]$ and $[\rho_{100}]$ : $[\rho (A) \cong \rho_{A} = \rho_{000} + (\rho_{100} - \rho_{000})\Delta x.]$ Similar expressions for $[\rho(B)]$ , $[\rho(C)]$ and $[\rho(D)]$ can also be written. Then, it is possible to calculate an approximate density at E from $[\rho (E) \cong \rho_{E} = \rho_{A} + (\rho_{B} - \rho_{A})\Delta y,]$ with a similar expression for $[\rho (F)]$ . Finally, the interpolated density at G between E and F is given by $[\rho(G) \cong \rho_{G} = \rho_{E} + (\rho_{F} - \rho_{E})\Delta z.]$ Putting all these together, it is easy to show that $[\eqalign{\rho_{G} &= \rho_{000} + \Delta x(\rho_{100} - \rho_{000}) + \Delta y(\rho_{010} - \rho_{000}) + \Delta z(\rho_{001} - \rho_{000})\cr &\quad + \Delta x \Delta y(\rho_{000} + \rho_{110} - \rho_{100} - \rho_{010})\cr &\quad + \Delta y \Delta z(\rho_{000} + \rho_{011} - \rho_{010} - \rho_{001})\cr &\quad + \Delta z \Delta x(\rho_{000} + \rho_{101} - \rho_{001} - \rho_{100})\cr &\quad + \Delta x \Delta y \Delta z(\rho_{100} + \rho_{010} + \rho_{001} + \rho_{111} - \rho_{000} - \rho_{101}\cr&\quad - \rho_{011} - \rho_{110}).}]$

13.4.9. Combining different crystal forms

| top | pdf |

Frequently, a molecule crystallizes in a variety of different crystal forms [e.g. hexokinase (Fletterick & Steitz, 1976 ), the influenza virus neuraminidase spike (Varghese et al., 1983 ), the histocompatibility antigen HLA (Bjorkman et al., 1987 ) and the CD4 receptor (Wang et al., 1990 )]. It is then advantageous to average between the different crystal forms. This can be achieved by averaging each crystal form independently into a standard orientation in the h-cell (if the redundancy is [N = 1] for a given crystal form, then this simply amounts to producing a skewed representation of the p-cell in the h-cell environment). The different results, now all in the same h-cell orientation, can be averaged. However, care must be taken to put equal weight on each molecular copy. If the ith cell contains $[N_{i}]$ noncrystallographic copies, then the average of the densities, $[\rho_{i}({\bf x})\ (i = 1, 2, \ldots, I)]$ , is $[{\textstyle\sum\limits_{i}} N_{i} \rho_{i} ({\bf x})\Big/{\textstyle\sum\limits_{i}}N_{i}]$ at each grid point, x, in the h-cell. Additional weights can be added to account for the subjective assessment of the quality of the electron densities in the different crystal cells.

With the h-cell density improved by averaging among different crystal forms, it can now be replaced into the different p-cells. These p-cells can then be back-transformed in the usual manner to obtain a better set of phases. These, in turn, can be associated with the observed structure amplitudes for each p-cell structure, and the cycle can be repeated.

13.4.10. Phase extension and refinement of the NCS parameters

| top | pdf |

Fourier back-transformation of the modified (averaged and solvent-flattened) map leads to poor phase information immediately outside the previously used resolution limit. If no density modification had been made, the Fourier transform would have yielded exactly the same structure factors as had been used for the original map. However, the modifications result in small structure amplitudes just beyond the previous resolution limit. The resultant phases can then be used in combination with the observed amplitudes in the next map calculation, thus extending the limit of resolution.

If the cell edge of an approximately cubic unit cell is a, and the approximate radius of the molecule is $[{\cal R}]$ (therefore, $[{\cal R} \lt a]$ ), then the first node of a spherical diffraction function will occur when $[H{\cal R} = 0.7]$ , where H is the length of the reciprocal-lattice vector between the closest previously known structure factor and the structure factor just outside the resolution limit. Let [H = n(1/a)] , and let it be assumed that the diffraction-function amplitude is negligible when $[H{\cal R} \gt 0.7]$ . Thus, for successful extension, $[n = a/{\cal R}]$ . In general, that means that phase extension should be less than two reciprocal-lattice units in one step.

As phase extension proceeds, the accuracy of the NCS elements and the boundaries of the envelope must be constantly improved and updated to match the improved resolution. Arnold & Rossmann (1986 , 1988 ) discussed phase error as a function of error in the NCS definition and applied rigid-body least-squares refinement for refining particle position and orientation of human rhinovirus 14. The `climb' procedure has been found especially useful (Muckelbauer et al., 1995 ). This depends upon searching one at a time for the parameters (rotational and translational) that minimize the near r.m.s. deviation of the individual densities to the resultant averaged densities.

Improvement of the NCS parameters is dependent upon an accurate knowledge of the cell dimensions. In the absence of such knowledge, the rotational NCS relationship cannot be accurate, since elastic distortion will result, leading to very poor averaged density. This was the case in the early determination of southern bean mosaic virus (Abad-Zapatero et al., 1980 ), where the structure solution was probably delayed at least one year due to a lack of accurate cell dimensions.

Another aspect to phase extension is the progressive decrease in or quality of observed structure amplitudes. The observed amplitudes can be augmented with the calculated values obtained by Fourier back-transformation of the averaged map. However, clearly, as the number of calculated values increases in proportion to the number of observed values, the rate of convergence decreases. In the limit, when there are no available $[F_{\rm obs}]$ values, averaging a map based on $[F_{\rm calc}]$ values will not alter it, and, thus, convergence stops entirely.

13.4.11. Convergence

| top | pdf |

Iterations consist of averaging, Fourier inversion of the average map, recombination of observed structure-factor amplitudes with calculated phases, and recalculation of a new electron-density map. Presumably, each new map is an improvement of the previous map as a consequence of using the improved phases resulting from the map-averaging procedure. However, after five or ten cycles, the procedure has usually converged so that each new map is essentially the same as the previous map. Convergence can be usefully measured by computing the correlation coefficient (CC) and R factor (R) between calculated ( $[F_{\rm calc}]$ ) and observed ( $[F_{\rm obs}]$ ) structure-factor amplitudes as a function of resolution (Fig. 13.4.11.1 ). These factors are defined as $[\eqalign{CC &= {{\textstyle\sum_{h}} \left(\langle F_{\rm obs} \rangle - F_{\rm obs}\right) \left(\langle F_{\rm calc} \rangle - F_{\rm calc}\right) \over \left[{\textstyle\sum_{h}} \left(\langle F_{\rm obs} \rangle - F_{\rm obs}\right)^{2} \left(\langle F_{\rm calc} \rangle - F_{\rm calc}\right)^{2}\right]^{1/2}},\cr R &= 100 \times {\textstyle\sum} \left|\left(\left|F_{\rm obs}\right| - \left|F_{\rm calc}\right|\right)\right| \Big/ {\textstyle\sum} \left|F_{\rm obs}\right|.}]$ Because of the lack of information immediately outside the resolution limit, these factors must necessarily be poor in the outermost resolution shell. Nevertheless, the outermost resolution shell will be the most sensitive to phase improvement as these structure factors will be the furthest from their correct values at the start of a set of iterations after a resolution extension.

Figure 13.4.11.1| top | pdf |

Plot of a correlation coefficient as the phases were extended from 8 to 3 Å resolution in the structure determination of Mengo virus. [Reproduced with permission from Luo et al. (1989 ).]

Convergence of CC and R does not, however, necessarily mean that phases are no longer changing from cycle to cycle. Usually, the small-amplitude structure factors keep changing long after convergence appears to have been reached (unpublished results). However, the small-amplitude structure factors make very little difference to the electron-density maps.

The rate of convergence can be improved by suitably weighting coefficients in the computation of the next electron-density map. It can be useful to reduce the weight of those structure factors where the difference between observed and calculated amplitudes is larger than the average difference, as, presumably, error in amplitude can also imply error in phase. Various weighting schemes are generally used (Sim, 1959 ; Rayment, 1983 ; Arnold et al., 1987 ; Arnold & Rossmann, 1988 ).

As mentioned above, the rate of convergence can also be improved by inclusion of $[F_{\rm calc}]$ values when no $[F_{\rm obs}]$ values have been measured. However, care must be taken to use suitable weights to ensure that the $[F_{\rm calc}]$ 's are not systematically larger or smaller than the $[F_{\rm obs}]$ values in the same resolution range.

Monitoring the CC or R factor for different classes of reflections (e.g. [h + k + l = 2n] and [h + k + l = 2n + 1] ) can be a good indicator of problems (Muckelbauer et al., 1995 ), particularly in the presence of pseudo-symmetries. All classes of reflections should behave similarly.

The power (P) of the phase determination and, hence, the rate of convergence and error in the final phasing has been shown to be (Arnold & Rossmann, 1986 ) proportional to $[P \propto (Nf)^{1/2} / \left[R \left(U/V\right)\right],]$ where N is the NCS redundancy, f is the fraction of observed reflections to those theoretically possible, R is a measure of error on the measured amplitudes (e.g. $[R_{\rm merge}]$ ) and [U/V] is the ratio of the volume of the density being averaged to the volume of the unit cell. Important implications of this relationship include that the phasing power is proportional to the square root of the NCS redundancy and that it is also dependent upon solvent content and diffraction-data quality and completeness.

13.4.12. Ab initio phasing starts

| top | pdf |

Some initial low-resolution model is required to initiate phasing at very low resolution. The use of cryo-EM reconstructions or available homologous structures is now quite usual. However, a phase determination using a sphere or hollow shell is also possible. In the case of a spherical virus, such an approximation is often very reasonable, as is evident when plotting the mean intensities at low resolution. These often show the anticipated distribution of a Fourier transform of a uniform sphere (Fig. 13.4.12.1 ). Thus, initiating phasing using a spherical model does require the prior determination of the average radius of the spherical virus. This can be done either by using an R-factor search (Tsao, Chapman & Rossmann, 1992 ) or by using low-angle X-ray scattering data (Chapman et al., 1992 ). A minimal model would be to estimate the value of F(000) on the same relative scale as the observed amplitudes. This structure factor must always have a positive value. Such a limited initial start was first explored by Rossmann & Blow (1963).

Figure 13.4.12.1| top | pdf |

Structure amplitudes of the type II crystals of southern bean mosaic virus, averaged within shells of reciprocal space, shown in relation to the Fourier transform of a 284 Å diameter sphere. The inset shows the complete spherical transform from infinity to 30 Å resolution. [Reproduced with permission from Johnson et al. (1976 ). Copyright (1976) Academic Press.]

In surprisingly many cases (Valegård et al., 1990 ; Chapman et al., 1992 ; McKenna, Xia, Willingmann, Ilag, Krishnaswamy et al., 1992 ; McKenna, Xia, Willingmann, Ilag & Rossmann, 1992 ; Tsao, Chapman & Rossmann, 1992 ; Tsao, Chapman, Wu et al., 1992 ), it has been found that initiating phasing by using a very low resolution model results in a phase solution of the Babinet inverted structure ( $[\alpha \rightarrow \alpha + \pi]$ ), where the desired density is negative instead of positive. Presumably, this is the result of phase convergence in a region where the assumed spherical transform is π out of step with reality. As long as this possibility is kept in mind with a watchful eye, such an inversion does not hamper good phase determination. In the case of phase extension, stepping too far in resolution can also lead to analogous problems (Arnold et al., 1987 ).

Similar errors can occur due to lack of information on the correct enantiomorph in the initial phasing model. In some cases, where spherical envelopes are used and the distribution of NCS elements is also centric, there will be no decision on hand, and the phases will remain centric (Johnson et al., 1975 ). However, in general, the enantiomorphic ambiguity (hand assignment) can be resolved by providing a model that has some asymmetry or by arbitrarily selecting the phase of a large-amplitude structure factor away from its centric value.

The progress of phase refinement away from false solutions has been the subject of `post mortem' examinations (Valegård et al., 1990 ; Chapman et al., 1992 ; McKenna, Xia, Willingmann, Ilag, Krishnaswamy et al., 1992 ; McKenna, Xia, Willingmann, Ilag & Rossmann, 1992 ; Tsao, Chapman & Rossmann, 1992 ; Tsao, Chapman, Wu et al., 1992 ; Dokland et al., 1998 ). The main lesson learned from these observations is that phase determination using NCS is amazingly powerful. Most initial errors in phasing gradually work themselves out with subsequent iterations and phase extension.

Perhaps the power of NCS phase determination should not be overly surprising. When phases are determined by multiple isomorphous replacement, the amount of data collected for the given molecular weight is [(N + 1)] , where N is the number of derivatives and is usually 3 or 4. Similarly, for multiwavelength anomalous-dispersion data collection, there might be measurements at four different wavelengths, essentially giving [N = 8] data points for each reflection. However, icosahedral virus determination frequently provides [N = 60] data points for the equivalent resolution.

13.4.13. Recent salient examples in low-symmetry cases: multidomain averaging and systematic applications of multiple-crystal-form averaging

| top | pdf |

When averaging molecules that have segmental flexibility, it is essential to be able to define the extents of and noncrystallographic relationships among multiple segments which can flexibly reorient. No general protocol has been described for determining the minimum size or optimal number of segments to use in such cases. If the number of segments used for averaging is too small, then the NCS parameters cannot accurately superpose the entirety of the related segments. If too many segments are used for averaging, the segments may become too small for accurate determination of the NCS parameters. The use of too many segments may also become awkward and somewhat inefficient, since in some program systems the total number of maps that must be stored in a given cycle of averaging is proportional to the number of segments used for averaging. Comparison of atomic models for related segments that have been built or refined independently may provide convenient definitions of envelopes for averaging. In practice, a radius of 2 Å or more (depending upon the stage of structure solution and completeness and expected reliability of the model) may be added around the atoms used to define a molecular mask or envelope used in averaging. As with other averaging procedures, multidomain and multiple-crystal-form averaging approaches generally benefit from updating the molecular masks as structure determination progresses.

Often, a macromolecule can be crystallized in multiple crystal forms. Advances in crystallization technology leading to the frequent occurrence of multiple crystal forms, coupled with the availability of convenient programs, have led to increasing frequency of application of multiple-crystal-form averaging for structure solution.

Proteins, especially those containing more than one folded domain, often contain flexible hinges. As long as the boundaries of and noncrystallographic relationships among the related domains in multiple copies can be determined, then density averaging can be used to improve phasing. Programs such as O can be conveniently used to obtain the initial transformations necessary for correct superposition of related segments. NCS parameters can be refined using routines that either minimize the density differences among related copies or that perform rigid-body refinements of atomic models.

A number of experimental techniques have been described that may permit more widespread application of multiple-domain and multiple-crystal-form averaging. Freezing of macromolecular crystals to liquid-nitrogen temperatures has become a routine approach for enhancing the resolution and quality of macromolecular X-ray diffraction data. With most macromolecular crystals, there is a shrinkage of the `frozen' unit cell relative to the lattice of the `unfrozen' crystals. In many cases, significantly different cell dimensions can also be obtained by using different cryo-protective buffer and salt conditions. These variations can be exploited in a systematic fashion for phasing by electron-density averaging, so long as (1) the shrinkage relationships among the different crystals are not merely isotropic and (2) the boundaries and NCS parameters among related segments can be determined. Perutz (Perutz, 1946 ; Bragg & Perutz, 1952 ) recognized the potential utility of such shrinkage stages for crystallographic phasing in studies of haemoglobin crystals with varying degrees of hydration.

Recent examples of structure solutions involving multidomain and multiple-crystal-form averaging include studies of HIV reverse transcriptase (RT) (Ren et al., 1995 ; Ding et al., 1995 ). Studies of HIV RT by Stuart and coworkers involved multidomain and multiple-crystal-form averaging using different soaking solutions (Esnouf et al., 1995 ; Ren et al., 1995 ), in some cases with dramatically improved diffraction resolution. Arnold and coworkers have applied multidomain and multiple-crystal-form averaging to studies of HIV RT, including a systematic application of averaging electron density between `frozen' and `unfrozen' crystal forms (Ding et al., 1995 ; Das et al., 1996 ). Tong et al. (1997 ) recently described electron-density averaging among multiple closely related crystal forms of the human cytomegalovirus protease that were obtained by treatment of the crystals with different soaking buffers containing differing levels of precipitants, such as salt and polyethylene glycol.

13.4.14. Programs

| top | pdf |

This review hopefully covers most aspects encountered when employing electron-density averaging, yet the authors have drawn liberally from their own experience. There are now a large number of averaging programs and procedures available, some more suitable for structure determinations of proteins with low NCS redundancy and improper relationships (Jones, 1992 ) and others particularly suitable for high NCS redundancy, such as is encountered in the study of icosahedral viruses. For large structures, phase determination can be a very time-consuming computer operation. Therefore, attempts have been made to parallelize some programs (Cornea-Hasegan et al., 1995 ), although this may lead to difficulties in exporting the programs to new and different computers.

Recently described program packages for symmetry averaging have been successfully applied to a number of cases. General program systems for averaging that are well suited to cases with high NCS include ENVelope (Rossmann et al., 1992 ) and GAP (Jonathan Grimes and David Stuart, unpublished results); these same packages have also been used for multiple-crystal-form averaging and problems with low symmetry. A number of the program packages have been conveniently integrated with interactive computer-graphics programs such as O (Jones et al., 1991 ) and most permit molecular-envelope definition by a number of possible approaches. RAVE and MAVE (Kleywegt & Jones, 1994 ), programs for graphics-assisted averaging within and between crystal forms, also come with an array of tools for flexible map handling and envelope definition (Kleywegt & Jones, 1996 ). The program systems DMMULTI (Cowtan & Main, 1993 ) and MAGICSQUASH (Schuller, 1996 ), which both derive from the program SQUASH (Zhang, 1993 ), can simultaneously apply real-space (symmetry averaging and solvent levelling with or without histogram matching) and reciprocal-space (phase refinement by the Sayre equation) constraints for phase improvement and extension. The advantage of adding phasing by the Sayre equation is greater at higher resolution, but appears to be significant in some cases, even at relatively low resolution (Cowtan & Main, 1993 ). MAGICSQUASH has been used to determine a number of structures which required multiple-domain and multiple-crystal-form averaging (Schuller, 1996 ). The DEMON/ANGEL package allows noncrystallographic averaging among multiple crystal forms together with solvent flattening and histogram matching (Vellieux et al., 1995 ). Other versatile programs for electron-density averaging include AVGSYS (Bolin et al., 1993 ) and PHASES (Furey & Swaminathan, 1990, 1997 ), both of which have features for facilitating definition and refinement of NCS parameters.

Acknowledgements

We are most grateful to Sharon Wilder and Cheryl Towell for extensive help in creating this manuscript. We are also grateful for decades of financial support by the National Science Foundation and the National Institutes of Health during the development of the techniques reported here.

References

Abad-Zapatero, C., Abdel-Meguid, S. S., Johnson, J. E., Leslie, A. G. W., Rayment, I., Rossmann, M. G., Suck, D. & Tsukihara, T. (1980). Structure of southern bean mosaic virus at 2.8 Å resolution. Nature (London), 286, 33–39.Google Scholar

Argos, P., Ford, G. C. & Rossmann, M. G. (1975). An application of the molecular replacement technique in direct space to a known protein structure. Acta Cryst. A31, 499–506.Google Scholar

Argos, P. & Rossmann, M. G. (1980). Molecular replacement method. In Theory and practice of direct methods in crystallography, edited by M. F. C. Ladd & R. A. Palmer, pp. 361–417. New York: Plenum.Google Scholar

Arnold, E. & Rossmann, M. G. (1986). Effect of errors, redundancy, and solvent content in the molecular replacement procedure for the structure determination of biological macromolecules. Proc. Natl Acad. Sci. USA, 83, 5489–5493.Google Scholar

Arnold, E. & Rossmann, M. G. (1988). The use of molecular-replacement phases for the refinement of the human rhinovirus 14 structure. Acta Cryst. A44, 270–282.Google Scholar

Arnold, E., Vriend, G., Luo, M., Griffith, J. P., Kamer, G., Erickson, J. W., Johnson, J. E. & Rossmann, M. G. (1987). The structure determination of a common cold virus, human rhinovirus 14. Acta Cryst. A43, 346–361.Google Scholar

Bjorkman, P. J., Saper, M. A., Samraoui, B., Bennett, W. S., Strominger, J. L. & Wiley, D. C. (1987). Structure of the human class I histocompatibility antigen, HLA-A2. Nature (London), 329, 506–512.Google Scholar

Bloomer, A. C., Champness, J. N., Bricogne, G., Staden, R. & Klug, A. (1978). Protein disk of tobacco mosaic virus at 2.8 Å resolution showing the interactions within and between subunits. Nature (London), 276, 362–368.Google Scholar

Bolin, J. T., Smith, J. L. & Muchmore, S. W. (1993). Considerations in phase refinement and extension: experiments with a rapid and automatic procedure. American Crystallographic Association Annual Meeting, May 23–28, Albuquerque, New Mexico, Vol. 21, p. 51.Google Scholar

Bragg, L. & Perutz, M. F. (1952). The structure of haemoglobin. Proc. R. Soc. London Ser. A, 213, 425–435.Google Scholar

Bricogne, G. (1974). Geometric sources of redundancy in intensity data and their use for phase determination. Acta Cryst. A30, 395–405.Google Scholar

Bricogne, G. (1976). Methods and programs for direct-space exploitation of geometric redundancies. Acta Cryst. A32, 832–847.Google Scholar

Buehner, M., Ford, G. C., Moras, D., Olsen, K. W. & Rossmann, M. G. (1974). Structure determination of crystalline lobster D-glyceraldehyde-3-phosphate dehydrogenase. J. Mol. Biol. 82, 563–585.Google Scholar

Champness, J. N., Bloomer, A. C., Bricogne, G., Butler, P. J. G. & Klug, A. (1976). The structure of the protein disk of tobacco mosaic virus at 5 Å resolution. Nature (London), 259, 20–24.Google Scholar

Chapman, M. S., Tsao, J. & Rossmann, M. G. (1992). Ab initio phase determination for spherical viruses: parameter determination for spherical-shell models. Acta Cryst. A48, 301–312.Google Scholar

Cornea-Hasegan, M. A., Zhang, Z., Lynch, R. E., Marinescu, D. C., Hadfield, A., Muckelbauer, J. K., Munshi, S., Tong, L. & Rossmann, M. G. (1995). Phase refinement and extension by means of non-crystallographic symmetry averaging using parallel computers. Acta Cryst. D51, 749–759.Google Scholar

Cowtan, K. D. & Main, P. (1993). Improvement of macromolecular electron-density maps by the simultaneous application of real and reciprocal space constraints. Acta Cryst. D49, 148–157.Google Scholar

Crowther, R. A. (1969). The use of non-crystallographic symmetry for phase determination. Acta Cryst. B25, 2571–2580.Google Scholar

Das, K., Ding, J., Hsiou, Y., Clark, A. D. Jr, Moereels, H., Koymans, L., Andries, K., Pauwels, R., Janssen, P. A. J., Boyer, P. L., Clark, P., Smith, R. H. Jr, Kroeger Smith, M. B., Michejda, C. J., Hughes, S. H. & Arnold, E. (1996). Crystal structure of 8-Cl and 9-Cl TIBO complexed with wild-type HIV-1 RT and 8-Cl TIBO complexed with the Tyr181Cys HIV-1 RT drug-resistant mutant. J. Mol. Biol. 264, 1085–1100.Google Scholar

Ding, J., Das, K., Tantillo, C., Zhang, W., Clark, A. D. Jr, Jessen, S., Lu, X., Hsiou, Y., Jacobo-Molina, A., Andries, K., Pauwels, R., Moereels, H., Koymans, L., Janssen, P. A. J., Smith, R. H. Jr, Kroeger Koepke, M., Michejda, C. J., Hughes, S. H. & Arnold, E. (1995). Structure of HIV-1 reverse transcriptase in a complex with the non-nucleoside inhibitor α-APA R 95845 at 2.8 Å resolution. Structure, 3, 365–379.Google Scholar

Dodson, E. J., Gover, S. & Wolf, W. (1992). Editors. Proceedings of the CCP4 study weekend. Molecular replacement. Warrington: Daresbury Laboratory.Google Scholar

Dokland, T., McKenna, R., Sherman, D. M., Bowman, B. R., Bean, W. F. & Rossmann, M. G. (1998). Structure determination of the φX174 closed procapsid. Acta Cryst. D54, 878–890.Google Scholar

Esnouf, R., Ren, J., Ross, C., Jones, Y., Stammers, D. & Stuart, D. (1995). Mechanism of inhibition of HIV-1 reverse transcriptase by non-nucleoside inhibitors. Nature Struct. Biol. 2, 303–308.Google Scholar

Fletterick, R. J. & Steitz, T. A. (1976). The combination of independent phase information obtained from separate protein structure determinations of yeast hexokinase. Acta Cryst. A32, 125–132.Google Scholar

Furey, W. & Swaminathan, S. (1990). PHASES: a program package for the processing and analysis of diffraction data from macromolecules. Am. Crystallogr. Assoc. Meeting Abstracts, 18, PA33, p. 73.Google Scholar

Furey, W. & Swaminathan, S. (1997). PHASES-95: a program package for the processing and analysis of diffraction data from macromolecules. Methods Enzymol. 277, 590–620.Google Scholar

Gaykema, W. P. J., Hol, W. G. J., Vereijken, J. M., Soeter, N. M., Bak, H. J. & Beintema, J. J. (1984). 3.2 Å structure of the copper-containing, oxygen-carrying protein Panulirus interruptus haemocyanin. Nature (London), 309, 23–29.Google Scholar

Harrison, S. C., Olson, A. J., Schutt, C. E., Winkler, F. K. & Bricogne, G. (1978). Tomato bushy stunt virus at 2.9 Å resolution. Nature (London), 276, 368–373.Google Scholar

Hogle, J. M., Chow, M. & Filman, D. J. (1985). Three-dimensional structure of poliovirus at 2.9 Å resolution. Science, 229, 1358–1365.Google Scholar

Johnson, J. E. (1978). Appendix II. Averaging of electron density maps. Acta Cryst. B34, 576–577.Google Scholar

Johnson, J. E., Akimoto, T., Suck, D., Rayment, I. & Rossmann, M. G. (1976). The structure of southern bean mosaic virus at 22.5 Å resolution. Virology, 75, 394–400.Google Scholar

Johnson, J. E., Argos, P. & Rossmann, M. G. (1975). Rotation function studies of southern bean mosaic virus at 22 Å resolution. Acta Cryst. B31, 2577–2583.Google Scholar

Jones, T. A. (1992). a, yaap, asap, @#*? A set of averaging programs. In Proceedings of the CCP4 study weekend. Molecular replacement, edited by E. Dodson, S. Gover & W. Wolf, pp. 91–105. Warrington: Daresbury Laboratory.Google Scholar

Jones, T. A., Zou, J.-Y., Cowan, S. W. & Kjeldgaard, M. (1991). Improved methods for building protein models in electron density maps and the location of errors in these models. Acta Cryst. A47, 110–119.Google Scholar

Kleywegt, G. J. & Jones, T. A. (1994). Halloween, masks and bones. In From first map to final model, edited by S. Bailey, R. Hubbard & D. Waller, pp. 59–66. Warrington: Daresbury Laboratory.Google Scholar

Kleywegt, G. J. & Jones, T. A. (1996). xdlMAPMAN and xdlDATAMAN – programs for reformatting, analysis and manipulation of biomacromolecular electron-density maps and reflection data sets. Acta Cryst. D52, 826–828.Google Scholar

Lin, Z., Konno, M., Abad-Zapatero, C., Wierenga, R., Murthy, M. R. N., Ray, W. J. Jr & Rossmann, M. G. (1986). The structure of rabbit muscle phosphoglucomutase at intermediate resolution. J. Biol. Chem. 261, 264–274.Google Scholar

Luo, M., Vriend, G., Kamer, G. & Rossmann, M. G. (1989). Structure determination of Mengo virus. Acta Cryst. B45, 85–92.Google Scholar

McKenna, R., Xia, D., Willingmann, P., Ilag, L. L., Krishnaswamy, S., Rossmann, M. G., Olson, N. H., Baker, T. S. & Incardona, N. L. (1992). Atomic structure of single-stranded DNA bacteriophage φX174 and its functional implications. Nature (London), 355, 137–143.Google Scholar

McKenna, R., Xia, D., Willingmann, P., Ilag, L. L. & Rossmann, M. G. (1992). Structure determination of the bacteriophage φX174. Acta Cryst. B48, 499–511.Google Scholar

Main, P. (1967). Phase determination using non-crystallographic symmetry. Acta Cryst. 23, 50–54.Google Scholar

Matthews, B. W., Sigler, P. B., Henderson, R. & Blow, D. M. (1967). Three-dimensional structure of tosyl-α-chymotrypsin. Nature (London), 214, 652–656.Google Scholar

Muckelbauer, J. K., Kremer, M., Minor, I., Tong, L., Zlotnick, A., Johnson, J. E. & Rossmann, M. G. (1995). Structure determination of coxsackievirus B3 to 3.5 Å resolution. Acta Cryst. D51, 871–887.Google Scholar

Muirhead, H., Cox, J. M., Mazzarella, L. & Perutz, M. F. (1967). Structure and function of haemoglobin. III. A three-dimensional Fourier synthesis of human deoxyhaemoglobin at 5.5 Å resolution. J. Mol. Biol. 28, 117–156.Google Scholar

Nordman, C. E. (1980). Procedures for detection and idealization of non-crystallographic symmetry with application to phase refinement of the satellite tobacco necrosis virus structure. Acta Cryst. A36, 747–754.Google Scholar

Perutz, M. F. (1946). Trans. Faraday Soc. 42B, 187.Google Scholar

Rayment, I. (1983). Molecular replacement method at low resolution: optimum strategy and intrinsic limitations as determined by calculations on icosahedral virus models. Acta Cryst. A39, 102–116.Google Scholar

Ren, J., Esnouf, R., Garman, E., Somers, D., Ross, C., Kirby, I., Keeling, J., Darby, G., Jones, Y., Stuart, D. & Stammers, D. (1995). High resolution structures of HIV-1 RT from four RT-inhibitor complexes. Nature Struct. Biol. 2, 293–302.Google Scholar

Rossmann, M. G. (1972). Editor. The molecular replacement method. New York: Gordon & Breach.Google Scholar

Rossmann, M. G. (1990). The molecular replacement method. Acta Cryst. A46, 73–82.Google Scholar

Rossmann, M. G. & Arnold, E. (2001). Patterson and molecular-replacement techniques. In International tables for crystallography, Vol. B. Reciprocal space, edited by U. Shmueli, ch. 2.3. Dordrecht: Kluwer Academic Publishers.Google Scholar

Rossmann, M. G., Arnold, E., Erickson, J. W., Frankenberger, E. A., Griffith, J. P., Hecht, H. J., Johnson, J. E., Kamer, G., Luo, M., Mosser, A. G., Rueckert, R. R., Sherry, B. & Vriend, G. (1985). Structure of a human common cold virus and functional relationship to other picornaviruses. Nature (London), 317, 145–153.Google Scholar

Rossmann, M. G. & Blow, D. M. (1962). The detection of sub-units within the crystallographic asymmetric unit. Acta Cryst. 15, 24–31.Google Scholar

Rossmann, M. G. & Blow, D. M. (1963). Determination of phases by the conditions of non-crystallographic symmetry. Acta Cryst. 16, 39–45.Google Scholar

Rossmann, M. G., Blow, D. M., Harding, M. M. & Coller, E. (1964). The relative positions of independent molecules within the same asymmetric unit. Acta Cryst. 17, 338–342.Google Scholar

Rossmann, M. G., McKenna, R., Tong, L., Xia, D., Dai, J.-B., Wu, H., Choi, H.-K. & Lynch, R. E. (1992). Molecular replacement real-space averaging. J. Appl. Cryst. 25, 166–180.Google Scholar

Schuller, D. J. (1996). MAGICSQUASH: more versatile non-crystallographic averaging with multiple constraints. Acta Cryst. D52, 425–434.Google Scholar

Sim, G. A. (1959). The distribution of phase angles for structures containing heavy atoms. II. A modification of the normal heavy-atom method for non-centrosymmetrical structures. Acta Cryst. 12, 813–815.Google Scholar

Tong, L., Qian, C., Davidson, W., Massariol, M.-J., Bonneau, P. R., Cordingley, M. G. & Lagacé, L. (1997). Experiences from the structure determination of human cytomegalovirus protease. Acta Cryst. D53, 682–690.Google Scholar

Tong, L. & Rossmann, M. G. (1995). Reciprocal-space molecular-replacement averaging. Acta Cryst. D51, 347–353.Google Scholar

Tsao, J., Chapman, M. S. & Rossmann, M. G. (1992). Ab initio phase determination for viruses with high symmetry: a feasibility study. Acta Cryst. A48, 293–301.Google Scholar

Tsao, J., Chapman, M. S., Wu, H., Agbandje, M., Keller, W. & Rossmann, M. G. (1992). Structure determination of monoclinic canine parvovirus. Acta Cryst. B48, 75–88.Google Scholar

Valegård, K., Liljas, L., Fridborg, K. & Unge, T. (1990). The three-dimensional structure of the bacterial virus MS2. Nature (London), 345, 36–41.Google Scholar

Varghese, J. N., Laver, W. G. & Colman, P. M. (1983). Structure of the influenza virus glycoprotein antigen neuraminidase at 2.9 Å resolution. Nature (London), 303, 35–40.Google Scholar

Vellieux, F. M. D. A. P., Hunt, J. F., Roy, S. & Read, R. J. (1995). DEMON/ANGEL: a suite of programs to carry out density modification. J. Appl. Cryst. 28, 347–351.Google Scholar

Wang, J., Yan, Y., Garrett, T. P. J., Liu, J., Rodgers, D. W., Garlick, R. L., Tarr, G. E., Husain, Y., Reinherz, E. L. & Harrison, S. C. (1990). Atomic structure of a fragment of human CD4 containing two immunoglobulin-like domains. Nature (London), 348, 411–418.Google Scholar

Wilson, I. A., Skehel, J. J. & Wiley, D. C. (1981). Structure of the haemagglutinin membrane glycoprotein of influenza virus at 3 Å resolution. Nature (London), 289, 366–373.Google Scholar

Zhang, K. Y. J. (1993). SQUASH – combining constraints for macromolecular phase refinement and extension. Acta Cryst. D49, 213–222.Google Scholar

International Tables for Crystallography (2006). Vol. F. ch. 13.4, pp. 279-292
https://doi.org/10.1107/97809553602060000684