International
Tables for Crystallography Volume F Crystallography of biological macromolecules Edited by M. G. Rossmann and E. Arnold © International Union of Crystallography 2006 |
International Tables for Crystallography (2006). Vol. F. ch. 13.4, pp. 279-292
https://doi.org/10.1107/97809553602060000684 Chapter 13.4. Noncrystallographic symmetry averaging of electron density for molecular-replacement phase refinement and extension
aDepartment of Biological Sciences, Purdue University, West Lafayette, IN 47907-1392, USA, and bBiomolecular Crystallography Laboratory, CABM & Rutgers University, 679 Hoes Lane, Piscataway, NJ 08854-5638, USA Noncrystallographic symmetry (NCS) occurs when symmetry operations are true only within a confined envelope, as opposed to being valid throughout the essentially infinite crystal lattice. Computationally, it is useful to define the molecular symmetry with reference to an arbitrary cell (the `h-cell') with the relationship Xn = [Rn]X1 (n = 1, N). Then the assembly of N NCS equivalent objects can be moved into the actual crystal (the `p-cell') using the relationship Y = [E]X. Hence each of the N units can be referred to the reference unit by Yn = [E][Rn]X1. In turn, the N units in the p-cell asymmetric unit can be multiplied by the crystal symmetry to produce the whole unit cell from the reference subunit in the h-cell. Procedures of averaging electron density will require a definition of the envelope either for the reference subunit or the whole of the molecular assembly if the NCS represents a closed point group (`proper' NCS). Averaging beyond the range of the NCS operators means that averaging is between non-equivalent densities. This causes the mean height of the average density to diminish and thus accurately indicates the limits of the NCS envelope. Various symmetry situations are examined, such as averaging subunits within the same crystal lattice (maybe proper symmetry) or between different crystal forms (necessarily improper symmetry). Phase extension is shown to be possible only by small defined increments of resolution after each cycle of averaging and solvent flattening. Keywords: ab initio phasing; computer programs; electron-density averaging; h-cell; molecular envelopes; molecular replacement; multidomain averaging; multiple-crystal-form averaging; noncrystallographic symmetry; p-cell; phase extension; phase refinement; phasing. |
Electron-density averaging for phasing crystal structures has become a widespread and nearly routine technique. High noncrystallographic symmetry (NCS) permits the solution of structures using relatively poor and even ab initio phasing starts, while lower NCS electron-density averaging can significantly improve initial phases obtained by techniques such as isomorphous replacement, anomalous scattering, or molecular replacement. Implicit in any averaging is solvent flattening in the regions not available for NCS averaging. Indeed, if all parts of the unit cell were consistent with the NCS, the symmetry would be crystallographic and valid throughout the crystal lattice.
A number of generalized averaging programs and software packages have been developed for macromolecular crystal structure analyses. Ease of use, coupled with relatively convenient definition of molecular envelopes, as well as enormous advances in computer technology, have facilitated the application of symmetry averaging to a diverse set of crystallographic problems. Averaging of separate domains in multidomain protein structures that can be divided into segments and averaging among multiple crystal forms is becoming increasingly common.
Extension of phases to higher resolution by symmetry averaging of electron density, coupled with solvent flattening, has been applied to numerous problems. The power of phase extension has been especially impressive in many cases involving high NCS, such as icosahedral virus structures. The overall power for phase improvement of averaging, combined with other density-modification techniques, such as solvent levelling, has been found to depend upon the degree of NCS, the solvent content of the crystals, and the quality and completeness of experimental data. Similar averaging methodology can be used for structure analysis by other imaging techniques, such as electron microscopy.
This chapter discusses the underlying principles of electron-density averaging for macromolecular crystallographic phase improvement and describes procedures for computer implementation of these techniques.
Crystallographic symmetry is valid for the infinite crystal lattice. Any crystallographic symmetry element relates all points within the crystal to equivalent points elsewhere. In contrast, an NCS operator is valid only locally within a finite volume (Fig. 13.4.2.1); if a periodic structure is superimposed on itself after operation with an NCS operator, it will superimpose only within the envelope1 defining the limits of the local symmetry.
A product of superimposed periodic structures will be non-periodic, containing only the point symmetry of the noncrystallographic operators (Fig. 13.4.2.2). This fact can frequently be used to select a molecular envelope that was not obvious prior to noncrystallographic averaging [see e.g. Buehner et al. (1974
) or Lin et al. (1986
)]. Although no knowledge of the crystallographic envelope is needed for this first averaging, it is necessary to determine it for the averaged molecular structure within the crystallographic cell to permit Fourier back-transformation.
There must be space between the confining envelopes governed by the local symmetry. Only the crystallographic symmetry is valid within this space. In the limit, when this space has diminished to zero, the local symmetry will have become a true crystallographic operator.
The definition of NCS can be extended to symmetry that relates similar objects in different crystal lattices. An operation that relates an object in one lattice to an equivalent object in another lattice will apply only to the chosen objects in each lattice. Beyond the confines of the chosen objects, there will be no coincidence of pattern.
Two kinds of NCS elements may be defined: proper and improper. The former satisfies a closed point group [e.g. a 17-fold rotation as occurs in tobacco mosaic virus disk protein (Champness et al., 1976)]. Here, it does not matter whether a rotation axis is applied right- or left-handedly; the result is indistinguishable. On the other hand, the relationship between different molecules in a crystallographic asymmetric unit is unlikely to be a closed point group. Thus, a rotation in one direction (followed by a translation) might achieve superposition of the two molecules, while a rotation in the opposite direction would not. This is called an improper NCS operator. An operation which takes a molecule in one unit cell to that in another unit cell (initially, the cells are lined up with, say, their orthogonalized a, b and c axes parallel) must equally be an improper rotation.
The position in space of a noncrystallographic rotation symmetry operator can be arbitrarily assigned. The rotation operation will orient the two molecules similarly. A subsequent translation, whose magnitude depends upon the location of the NCS operator, will always be able to superimpose the molecules (Fig. 13.4.2.3). Nevertheless, it is possible to select the position of the NCS axis such that the translation is a minimum, and that will occur when the translation is entirely parallel to the noncrystallographic rotation axis.
The position of an NCS axis, like everything else in the unit cell, must be defined with respect to a selected origin. Consider the noncrystallographic rotation defined by the matrix [C]. Then, if the point x is rotated to x′ (both defined with respect to the selected origin and axial system),
where d is a three-dimensional vector which expresses the translational component of the NCS operation. The magnitude of the components of d is quite arbitrary unless the position of the rotation axis is defined. If the rotation axis represents a proper NCS element, there will exist a point x on the rotation axis, when positioned to eliminate translation, such that it is rotated onto x′. It follows that for such a point
from which d can be determined if the position of the molecular centre is known. Note that
if, and only if, the noncrystallographic rotation axis passes through the crystallographic origin.
The presence of proper NCS in a crystal can help phase determination considerably. Consider, for example, a tetramer with 222 symmetry. It is not necessary to define the chemical limits of any one polypeptide chain as the NCS is true everywhere within the molecular envelope and the boundaries of the polypeptide chain are irrelevant to the geometrical considerations. The electron density at every point within the molecular envelope (which itself must have 222 symmetry) can be averaged among all four 222-related points without any chemical knowledge of the configuration of the monomer polypeptide. On the other hand, if there is only improper NCS, then the envelope must define the limits of one noncrystallographic asymmetric unit, although the crystallographic asymmetric unit contains two or more such units.
The molecular replacement method [cf. Rossmann & Blow (1962); Rossmann (1972,
1990
); Argos & Rossmann (1980
); Rossmann & Arnold (2001
)] is dependent upon the presence of NCS, whether it relates objects within one crystal lattice or between crystal lattices. The NCS rotational relationship in real space is exactly mimicked in reciprocal space. Local symmetry in real space has the equivalent effect of rotating a reciprocal lattice onto itself or another (with origins coincident), such that the integral reciprocal-lattice points of one reciprocal space coincide with non-integral reciprocal-lattice positions in the other. As the reciprocal lattice samples the Fourier transform of a molecule only at finite and integral reciprocal-lattice points, the effect of an NCS operation is to permit sampling of the molecular transform at intermediate non-integral reciprocal-lattice positions. If such sampling occurs frequently enough, it will constitute a plot of the continuous transform of the molecule and, hence, amount to a structure determination.
Whenever a molecule exists more than once either in the same unit cell or in different unit cells, then error in the molecular electron-density distribution due to error in phasing can be reduced by averaging the various molecular copies. The number of such copies, N, is referred to as the noncrystallographic redundancy. As the NCS is, by definition, only local (often pertaining to a particular molecular centre), there are holes and gaps between the averaged density, which presumably are solvent space between molecules. Thus, the electron density can be improved both by averaging electron density and by setting the density between molecules to a low, constant value (`solvent flattening'). Phases calculated by Fourier back-transforming the improved density should be more accurate than the original phases. Hence, the observed structure amplitudes (suitably weighted) can be associated with the improved phases, and a new and improved map can be calculated. This, in turn, can again be averaged until convergence has been reached and the phases no longer change. In addition, the back-transformed map can be used to compute phases just beyond the extremity of the resolution of the terms used in the original map. The resultant amplitudes will not be zero because the map had been modified by averaging and solvent flattening. Thus, phases can be gradually extended and improved, starting from a very low resolution approximation to the molecular structure. This procedure was first implemented in reciprocal space (Rossmann & Blow, 1963; Main, 1967
; Crowther, 1969
) and then, more recently, in real space (Bricogne, 1974,
1976
; Johnson, 1978
; Jones, 1992
; Rossmann et al., 1992
). More recently still, there has been an attempt to reproduce the very successful real-space procedure in reciprocal space (Tong & Rossmann, 1995
).
Early examples of such a procedure for phase improvement are the structure determinations of deoxyhaemoglobin (Muirhead et al., 1967), α-chymotrypsin (Matthews et al., 1967
), lobster glyceraldehyde-3-phosphate dehydrogenase (Buehner et al., 1974
), hexokinase (Fletterick & Steitz, 1976
), tobacco mosaic virus disk protein (Champness et al., 1976
; Bloomer et al., 1978
), the influenza virus haemagglutinin spike (Wilson et al., 1981
), tomato bushy stunt virus (Harrison et al., 1978
) and southern bean mosaic virus (Abad-Zapatero et al., 1980
). Early examples of phase extension, using real-space electron-density averaging, were the study of glyceraldehyde-3-phosphate dehydrogenase (Argos et al., 1975
), satellite tobacco necrosis virus (Nordman, 1980
), haemocyanin (Gaykema et al., 1984
), human rhinovirus 14 (Rossmann et al., 1985
) and poliovirus (Hogle et al., 1985
). Since then, this method has been used in numerous virus structure determinations, with the phase extension being initiated from ever lower resolution.
A once-popular computer program for real-space averaging was written by Gerard Bricogne (1976). Another program has been described by Johnson (1978
). Both programs were based on a double-sorting procedure. Bricogne (1976
) had suggested that, with interpolation between grid points using linear polynomials, it was necessary to sample electron density at grid intervals finer than one-sixth of the resolution limit of the Fourier terms that were used in calculating the map. With the availability of more computer memory, it was possible to store much of the electron density, thus avoiding time-consuming sorting operations (Hogle et al., 1985
; Luo et al., 1989
). Simultaneously, the storage requirements could be drastically reduced by using interpolation with quadratic polynomials. While the latter required a little extra computation time, this was far less than what would have been needed for sorting. Furthermore, it was found that Bricogne's estimate for the fineness of the map storage grid was too pessimistic, even for linear interpolation, which works well to about 1/2.5 of the resolution limit of the map.
In addition to changes in strategy brought about by computers with much larger memories, experience has been gained in program requirements for real-space averaging for phase determination (Dodson et al., 1992). Here we give a general procedure for electron-density averaging.
It is useful to define two types of unit cells.
Since the averaged molecule is to be placed into all crystallographically related positions in the p-cell, it is essential to know the envelope that encloses a single molecule. Care must be taken that the envelopes from neighbouring molecules in the p-cell do not overlap. The remaining space between the limits of the envelopes of the variously placed molecules in the p-cell can be taken to be solvent and, hence, flattened, a useful physical assumption for helping phase determination.
The h-cell must be chosen to be at least as large as the largest dimension of the molecule. In general, it is convenient to define the h-cell with and
, while placing the molecular centre at
. For example, if the molecule is a viral particle with icosahedral symmetry, the standard orientation can be defined by placing the twofold axes to correspond to the h-cell unit-cell axes, a procedure which can be done in one of two ways (Fig. 13.4.4.1
). It will be necessary to know how the molecule (or particle) in the h-cell is related to the `reference' molecule in the p-cell. The known p-cell crystallographic symmetry then permits the complete construction of the p-cell structure from whatever is the current h-cell electron-density representation of the molecule.
The h-cell is used to represent the density of a molecule in the standard orientation obtained by averaging all the noncrystallographic units in the p-cell. While density within a specific molecule will tend to be reinforced by the averaging procedure, the density outside the molecular boundaries will tend to be diminished. Thus, by averaging into the h-cell, the molecular envelope is revealed automatically. Indeed, the greater the NCS, the greater the clarity of the molecular boundary. Hence, the averaged molecule in the h-cell can be used to define a molecular mask in the p-cell automatically.
Averaging into the h-cell is also useful for displaying the molecule in a standard orientation (i.e. obtaining the electron-density distribution on skew planes). Thus, it is possible to display the molecule, for instance, with sections perpendicular to a molecular twofold axis, and to position the molecular symmetry axes accurately. From this, it is then easy to define the limits of the molecular asymmetric unit (Fig. 13.4.4.1). Hence, it is possible to save a great deal of computing time by evaluating the electron density in the h-cell only at those grid points within and immediately surrounding the noncrystallographic asymmetric unit.
Transformations will now be described which relate noncrystallographically related positions distributed among several fragmented copies of the molecule in the asymmetric unit of the p-cell and between the p-cell and the h-cell.
Let Y and X be position vectors in a Cartesian coordinate system whose components have dimensions of length, in the p- and h-cells, which utilize the same origin as the fractional coordinates, y and x, respectively. Let and
be `orthogonalization' and `de-orthogonalization' matrices in the p- and h-cells, respectively (Rossmann & Blow, 1962
). Then
Thus, for instance,
denotes a matrix that transforms a Cartesian set of unit vectors to fractional distances along the unit-cell vectors
.
Let the Cartesian coordinates Y and X be related by the rotation matrix [ω] and the translation vector D such that If the molecules are to be averaged among different unit cells, then each p-cell must be related to the standard h-cell orientation by a different [ω] and D. Then, from (13.4.5.1
) and (13.4.5.2
)
Now, if [ω] represents the rotational relationship between the `reference' molecule, , in the p-cell with respect to the h-cell, then from (13.4.5.3
)
where
refers to the fractional coordinates of the mth molecule in the p-cell.
Assuming there is only one molecule per asymmetric unit in the p-cell, let the mth molecule in the p-cell be related to the reference molecule by the crystallographic rotation and translational operators
, such that
For convenience, all translational components will initially be neglected in the further derivations below, but they will be reintroduced in the final stages. Hence, from (13.4.5.3
) and (13.4.5.4
)
Further, if
refers to the nth subunit within the molecule in the h-cell, and similarly if
refers to the nth subunit within the mth molecule of the p-cell, then from (13.4.5.5
)
Finally, the rotation matrix
is used to define the relationship among the N (
for a dimer, 4 for a 222 tetramer, 60 for an icosahedral virus etc.) noncrystallographic asymmetric units of the molecule within the h-cell. Then
Consider averaging the density at N noncrystallographically related points in the p-cell and replacing that density into the p-cell. By substituting for and
in (13.4.5.7
) and using (13.4.5.6
),
or
Now set
giving
where
is the corresponding translational element. Note that multiplication by
thus corresponds to the following sequence of transformations: (1) placing all the crystallographically related subunits into the reference orientation with
; (2) `orthogonalizing' the coordinates with
; (3) rotating the coordinates into the h-cell with [ω]; (4) rotating from the reference subunit of the molecule of the h-cell with
; (5) rotating these back into the p-cell with
; (6) `de-orthogonalizing' in the p-cell with
; and (7) placing these back into each of the M crystallographic asymmetric units of the p-cell with
.
The translational elements, , can now be evaluated. Let
be the fractional coordinates of the centre (or some arbitrary position) of the mth molecule in the p-cell; hence,
denotes the molecular centre position of the reference molecule in the p-cell. If
is at the intersection of the molecular rotation axes, then it will be the same for all n molecular asymmetric units. Therefore, it follows from (13.4.5.10
) that
or
Equation (13.4.5.11b)
can be used to find all the N noncrystallographic asymmetric units within the crystallographic asymmetric unit of the p-cell. Thus, this is the essential equation for averaging the density in the p-cell and replacing it into the p-cell.
Consider averaging the density at N noncrystallographically related points in the p-cell and placing that result into the h-cell. From (13.4.5.7), multiplying by
,
From (13.4.5.1
) and (13.4.5.6
),
Since it is only necessary to place the reference molecule of the p-cell into the h-cell, it is sufficient to consider the case when
, in which case
is the identity matrix [I]. It then follows, by inversion, that
which corresponds to: (1) `orthogonalizing' the h-cell fractional coordinates with
; (2) rotating into the nth noncrystallographic unit within the molecule using
; (3) rotating into the p-cell with
; and (4) `de-orthogonalizing' into fractional p-cell coordinates with
.
Now, if is the molecular centre in the h-cell (usually
), then
and
Equation (13.4.5.13
) determines the position of the N noncrystallographically related points
in the p-cell whose average value is to be placed at x in the h-cell.
Various techniques are available for determining the molecular envelope within which density can be averaged and outside of which the solvent can be flattened.
Procedures (2) and (3)
are advisable when the NCS redundancy is low. Procedure (4)
works well when the NCS redundancy is four or higher. The crystallographic asymmetric unit is likely to contain bits and pieces of molecules centred at various positions in the unit cell and neighbouring unit cells. Therefore, it is necessary to associate each grid point within the p-cell crystallographic asymmetric unit to a specific molecular centre or to solvent.
If the molecular-boundary assignments are to be made automatically, then the following procedure can be used. The number, M, of such molecules can be estimated by generating all centres, derived from the given position of the centre for the reference molecule, , and then determining whether a molecule of radius
would impinge on the crystallographic asymmetric unit within the defined boundaries. Here,
is a liberal estimate of the molecular radius. The corresponding rotation matrices
and translation vectors
can then be computed from (13.4.5.9
) and (13.4.5.11a
).
Any grid point whose distance from all M centres is greater than can immediately be designated as being in the solvent region. For other grid points, it is necessary to examine the corresponding h-cell density. From (13.4.5.12
), it follows that (setting
)
where
(n can be set to 1, since the h-cell presumably contains an averaged molecular electron density, in which case it does not matter which molecular asymmetric unit is referenced). Thus, (13.4.6.1
) can be used to determine the electron density at
by inspecting the corresponding interpolated density,
, at x in the h-cell. Transfer of the electron density,
, from the h-cell to the p-cell using (13.4.6.1
) is often useful to obtain an initial structure. However, to determine a suitable mask, it is useful to evaluate a modified electron density,
, (see below) for the grid points immediately around x in the h-cell.
A variable parameter `CRIT' can be specified to establish the distribution of grid points that are within the molecular envelope. When the modified electron density, , is less than CRIT, the corresponding grid point at y is assumed to be in solvent. Otherwise, when
exceeds CRIT, the grid point at y is assigned to that molecule which has the largest
. If the percentage of grid points which might be assigned to more than one molecule is large (say, greater than 1% of the total number of grid points), it probably signifies that the value of CRIT is too low, that the molecular boundary is far from clear, or that the function used to define
was badly chosen (Fig. 13.4.6.1
). Grid points outside the molecular envelope can be set to the average solvent density.
An essential criterion for the molecular envelope is that it obeys the noncrystallographic point-group symmetry. If the original h-cell electron density already possesses the molecular symmetry (e.g. icosahedral 532, 222 etc.), then the p-cell mask should also have that symmetry. However, if the mask boundaries were chosen manually, masks from different molecular centres might be in conflict and have local errors in the correct molecular symmetry. Such errors can be corrected by reimposing the noncrystallographic point-group symmetry on the p-cell mask. This can be conveniently achieved by setting the density at each grid point that was considered within the molecular envelope to a value of 100, and all other grid points to a density of zero. If the resultant density is averaged using the same routine as is used for averaging the actual electron density of the molecule, then the average density will remain 100 if the interpolated density is 100 at all noncrystallographically related points. However, if the original grid point is near the edge of the mask, finding the density at symmetry-related points may involve interpolation between density at level 100 and at level 0, giving an averaged density of less than 100. Hence, any grid point whose averaged density is below some criterion should be attributed to solvent.
Other improvements to mask generation were discussed by Rossmann et al. (1992). In any event, the molecular-envelope definition should be periodically re-examined after a suitable number of electron-density-averaging cycles.
Electron density can be averaged (1) among the N NCS-related molecules in the p-cell (the real crystal unit cell), thus creating a new and improved map of the p-cell; (2) among the N NCS-related molecules in the p-cell and placing the results into a standard orientation in the h-cell; or (3) among the N NCS-related molecules in different unit cells and placing the results back into the original different unit cells or into a standard h-cell. Before averaging commences, the matrices
and translation vectors
must be evaluated [see (13.4.5.9
) and (13.4.5.11a
)]. Here, N is the noncrystallographic redundancy and M is the number of molecules that impinge on the crystallographic asymmetric unit of the p-cell. Associated with each grid point in the p-cell asymmetric unit will be (1) the value of m designating which molecular centre is to be associated with that grid point (a special value of m is for solvent) and (2) the p-cell electron density at that point.
The grid points within the asymmetric unit are then examined one at a time. If the grid point is within the mask, it is averaged among the N noncrystallographically related equivalent positions belonging to molecule m. If the grid point is solvent, the density can be set to the average solvent density.
The N noncrystallographically equivalent non-integral grid points can be computed from (13.4.5.11a ). Some of these will lie outside the crystallographic asymmetric unit. These will, therefore, have to be operated on by unit-cell translations and crystallographic symmetry operations to bring them back into the asymmetric unit before the corresponding interpolated density can be calculated.
Averaging into the h-cell can be done by a procedure similar to averaging in the p-cell, except that the rotation and translation matrices are given by (13.4.5.13). Furthermore, no mask is required as all the averaging into the h-cell (from p-cell electron density) can be done with respect to the reference molecule centred at
in the p-cell. Each grid point is taken in turn in the h-cell. The electron density at any grid point that is further away from
than from
is set to zero. Other grid-point positions are expanded into the N equivalent positions in the p-cell surrounding
. The interpolated density is then found, averaged over the N equivalent positions, and stored at the original h-cell grid point in successive sections, in the same way as in the p-cell averaging. As in averaging within the p-cell, a record is kept of
as a function of
(Table 13.4.7.1
). In general, the local NCS is valid only within the molecule. Hence, the h-cell density will show the molecular envelope and can be used to recompute an improved p-cell density mask. The rate of build up of signal within the molecule should be roughly proportional to N, while the rate outside the molecule should be proportional to about
.
|
Some thought must go into defining the size of the grid interval. Shannon's sampling theorem shows that the grid interval must never be greater than half the limiting resolution of the data. Thus, for instance, if the limiting resolution is 3 Å, the grid intervals must be smaller than 1.5 Å. Clearly, the finer the grid interval, the more accurate the interpolated density, but the computing time will increase with the inverse cube of the size of the grid step. Similarly, if the grid interval is fine, less care and fewer points can be used for interpolation, thus balancing the effect of the finer grid in terms of computing time. In practice, it has been found that an eight-point interpolation (as described below) can be used, provided the grid interval is less than 1/2.5 of the resolution (Rossmann et al., 1992). Other interpolation schemes have also been used (e.g. Bricogne, 1976
; Nordman, 1980
; Hogle et al., 1985
; Bolin et al., 1993
).
A straightforward `linear' interpolation can be discussed with reference to Fig. 13.4.8.1 (in mathematical literature, this is called a trilinear approximation or a tensor product of three one-dimensional linear interpolants). Let G be the position at which the density is to be interpolated, and let this point have the fractional grid coordinates Δx, Δy, Δz within the box of surrounding grid points. Let 000 be the point at
,
. Other grid points will then be at 100, 010, 001 etc., with the point diagonally opposite the origin at 111.
The density at A (between 000 and 100) can then be approximated as the value of the linear interpolant of and
:
Similar expressions for
,
and
can also be written. Then, it is possible to calculate an approximate density at E from
with a similar expression for
. Finally, the interpolated density at G between E and F is given by
Putting all these together, it is easy to show that
Frequently, a molecule crystallizes in a variety of different crystal forms [e.g. hexokinase (Fletterick & Steitz, 1976), the influenza virus neuraminidase spike (Varghese et al., 1983
), the histocompatibility antigen HLA (Bjorkman et al., 1987
) and the CD4 receptor (Wang et al., 1990
)]. It is then advantageous to average between the different crystal forms. This can be achieved by averaging each crystal form independently into a standard orientation in the h-cell (if the redundancy is
for a given crystal form, then this simply amounts to producing a skewed representation of the p-cell in the h-cell environment). The different results, now all in the same h-cell orientation, can be averaged. However, care must be taken to put equal weight on each molecular copy. If the ith cell contains
noncrystallographic copies, then the average of the densities,
, is
at each grid point, x, in the h-cell. Additional weights can be added to account for the subjective assessment of the quality of the electron densities in the different crystal cells.
With the h-cell density improved by averaging among different crystal forms, it can now be replaced into the different p-cells. These p-cells can then be back-transformed in the usual manner to obtain a better set of phases. These, in turn, can be associated with the observed structure amplitudes for each p-cell structure, and the cycle can be repeated.
Fourier back-transformation of the modified (averaged and solvent-flattened) map leads to poor phase information immediately outside the previously used resolution limit. If no density modification had been made, the Fourier transform would have yielded exactly the same structure factors as had been used for the original map. However, the modifications result in small structure amplitudes just beyond the previous resolution limit. The resultant phases can then be used in combination with the observed amplitudes in the next map calculation, thus extending the limit of resolution.
If the cell edge of an approximately cubic unit cell is a, and the approximate radius of the molecule is (therefore,
), then the first node of a spherical diffraction function will occur when
, where H is the length of the reciprocal-lattice vector between the closest previously known structure factor and the structure factor just outside the resolution limit. Let
, and let it be assumed that the diffraction-function amplitude is negligible when
. Thus, for successful extension,
. In general, that means that phase extension should be less than two reciprocal-lattice units in one step.
As phase extension proceeds, the accuracy of the NCS elements and the boundaries of the envelope must be constantly improved and updated to match the improved resolution. Arnold & Rossmann (1986, 1988
) discussed phase error as a function of error in the NCS definition and applied rigid-body least-squares refinement for refining particle position and orientation of human rhinovirus 14. The `climb' procedure has been found especially useful (Muckelbauer et al., 1995
). This depends upon searching one at a time for the parameters (rotational and translational) that minimize the near r.m.s. deviation of the individual densities to the resultant averaged densities.
Improvement of the NCS parameters is dependent upon an accurate knowledge of the cell dimensions. In the absence of such knowledge, the rotational NCS relationship cannot be accurate, since elastic distortion will result, leading to very poor averaged density. This was the case in the early determination of southern bean mosaic virus (Abad-Zapatero et al., 1980), where the structure solution was probably delayed at least one year due to a lack of accurate cell dimensions.
Another aspect to phase extension is the progressive decrease in or quality of observed structure amplitudes. The observed amplitudes can be augmented with the calculated values obtained by Fourier back-transformation of the averaged map. However, clearly, as the number of calculated values increases in proportion to the number of observed values, the rate of convergence decreases. In the limit, when there are no available values, averaging a map based on
values will not alter it, and, thus, convergence stops entirely.
Iterations consist of averaging, Fourier inversion of the average map, recombination of observed structure-factor amplitudes with calculated phases, and recalculation of a new electron-density map. Presumably, each new map is an improvement of the previous map as a consequence of using the improved phases resulting from the map-averaging procedure. However, after five or ten cycles, the procedure has usually converged so that each new map is essentially the same as the previous map. Convergence can be usefully measured by computing the correlation coefficient (CC) and R factor (R) between calculated () and observed (
) structure-factor amplitudes as a function of resolution (Fig. 13.4.11.1
). These factors are defined as
Because of the lack of information immediately outside the resolution limit, these factors must necessarily be poor in the outermost resolution shell. Nevertheless, the outermost resolution shell will be the most sensitive to phase improvement as these structure factors will be the furthest from their correct values at the start of a set of iterations after a resolution extension.
![]() | Plot of a correlation coefficient as the phases were extended from 8 to 3 Å resolution in the structure determination of Mengo virus. [Reproduced with permission from Luo et al. (1989 |
Convergence of CC and R does not, however, necessarily mean that phases are no longer changing from cycle to cycle. Usually, the small-amplitude structure factors keep changing long after convergence appears to have been reached (unpublished results). However, the small-amplitude structure factors make very little difference to the electron-density maps.
The rate of convergence can be improved by suitably weighting coefficients in the computation of the next electron-density map. It can be useful to reduce the weight of those structure factors where the difference between observed and calculated amplitudes is larger than the average difference, as, presumably, error in amplitude can also imply error in phase. Various weighting schemes are generally used (Sim, 1959; Rayment, 1983
; Arnold et al., 1987
; Arnold & Rossmann, 1988
).
As mentioned above, the rate of convergence can also be improved by inclusion of values when no
values have been measured. However, care must be taken to use suitable weights to ensure that the
's are not systematically larger or smaller than the
values in the same resolution range.
Monitoring the CC or R factor for different classes of reflections (e.g. and
) can be a good indicator of problems (Muckelbauer et al., 1995
), particularly in the presence of pseudo-symmetries. All classes of reflections should behave similarly.
The power (P) of the phase determination and, hence, the rate of convergence and error in the final phasing has been shown to be (Arnold & Rossmann, 1986) proportional to
where N is the NCS redundancy, f is the fraction of observed reflections to those theoretically possible, R is a measure of error on the measured amplitudes (e.g.
) and
is the ratio of the volume of the density being averaged to the volume of the unit cell. Important implications of this relationship include that the phasing power is proportional to the square root of the NCS redundancy and that it is also dependent upon solvent content and diffraction-data quality and completeness.
Some initial low-resolution model is required to initiate phasing at very low resolution. The use of cryo-EM reconstructions or available homologous structures is now quite usual. However, a phase determination using a sphere or hollow shell is also possible. In the case of a spherical virus, such an approximation is often very reasonable, as is evident when plotting the mean intensities at low resolution. These often show the anticipated distribution of a Fourier transform of a uniform sphere (Fig. 13.4.12.1). Thus, initiating phasing using a spherical model does require the prior determination of the average radius of the spherical virus. This can be done either by using an R-factor search (Tsao, Chapman & Rossmann, 1992
) or by using low-angle X-ray scattering data (Chapman et al., 1992
). A minimal model would be to estimate the value of F(000) on the same relative scale as the observed amplitudes. This structure factor must always have a positive value. Such a limited initial start was first explored by Rossmann & Blow (1963)
.
In surprisingly many cases (Valegård et al., 1990; Chapman et al., 1992
; McKenna, Xia, Willingmann, Ilag, Krishnaswamy et al., 1992
; McKenna, Xia, Willingmann, Ilag & Rossmann, 1992
; Tsao, Chapman & Rossmann, 1992
; Tsao, Chapman, Wu et al., 1992
), it has been found that initiating phasing by using a very low resolution model results in a phase solution of the Babinet inverted structure (
), where the desired density is negative instead of positive. Presumably, this is the result of phase convergence in a region where the assumed spherical transform is π out of step with reality. As long as this possibility is kept in mind with a watchful eye, such an inversion does not hamper good phase determination. In the case of phase extension, stepping too far in resolution can also lead to analogous problems (Arnold et al., 1987
).
Similar errors can occur due to lack of information on the correct enantiomorph in the initial phasing model. In some cases, where spherical envelopes are used and the distribution of NCS elements is also centric, there will be no decision on hand, and the phases will remain centric (Johnson et al., 1975). However, in general, the enantiomorphic ambiguity (hand assignment) can be resolved by providing a model that has some asymmetry or by arbitrarily selecting the phase of a large-amplitude structure factor away from its centric value.
The progress of phase refinement away from false solutions has been the subject of `post mortem' examinations (Valegård et al., 1990; Chapman et al., 1992
; McKenna, Xia, Willingmann, Ilag, Krishnaswamy et al., 1992
; McKenna, Xia, Willingmann, Ilag & Rossmann, 1992
; Tsao, Chapman & Rossmann, 1992
; Tsao, Chapman, Wu et al., 1992
; Dokland et al., 1998
). The main lesson learned from these observations is that phase determination using NCS is amazingly powerful. Most initial errors in phasing gradually work themselves out with subsequent iterations and phase extension.
Perhaps the power of NCS phase determination should not be overly surprising. When phases are determined by multiple isomorphous replacement, the amount of data collected for the given molecular weight is , where N is the number of derivatives and is usually 3 or 4. Similarly, for multiwavelength anomalous-dispersion data collection, there might be measurements at four different wavelengths, essentially giving
data points for each reflection. However, icosahedral virus determination frequently provides
data points for the equivalent resolution.
13.4.13. Recent salient examples in low-symmetry cases: multidomain averaging and systematic applications of multiple-crystal-form averaging
When averaging molecules that have segmental flexibility, it is essential to be able to define the extents of and noncrystallographic relationships among multiple segments which can flexibly reorient. No general protocol has been described for determining the minimum size or optimal number of segments to use in such cases. If the number of segments used for averaging is too small, then the NCS parameters cannot accurately superpose the entirety of the related segments. If too many segments are used for averaging, the segments may become too small for accurate determination of the NCS parameters. The use of too many segments may also become awkward and somewhat inefficient, since in some program systems the total number of maps that must be stored in a given cycle of averaging is proportional to the number of segments used for averaging. Comparison of atomic models for related segments that have been built or refined independently may provide convenient definitions of envelopes for averaging. In practice, a radius of 2 Å or more (depending upon the stage of structure solution and completeness and expected reliability of the model) may be added around the atoms used to define a molecular mask or envelope used in averaging. As with other averaging procedures, multidomain and multiple-crystal-form averaging approaches generally benefit from updating the molecular masks as structure determination progresses.
Often, a macromolecule can be crystallized in multiple crystal forms. Advances in crystallization technology leading to the frequent occurrence of multiple crystal forms, coupled with the availability of convenient programs, have led to increasing frequency of application of multiple-crystal-form averaging for structure solution.
Proteins, especially those containing more than one folded domain, often contain flexible hinges. As long as the boundaries of and noncrystallographic relationships among the related domains in multiple copies can be determined, then density averaging can be used to improve phasing. Programs such as O can be conveniently used to obtain the initial transformations necessary for correct superposition of related segments. NCS parameters can be refined using routines that either minimize the density differences among related copies or that perform rigid-body refinements of atomic models.
A number of experimental techniques have been described that may permit more widespread application of multiple-domain and multiple-crystal-form averaging. Freezing of macromolecular crystals to liquid-nitrogen temperatures has become a routine approach for enhancing the resolution and quality of macromolecular X-ray diffraction data. With most macromolecular crystals, there is a shrinkage of the `frozen' unit cell relative to the lattice of the `unfrozen' crystals. In many cases, significantly different cell dimensions can also be obtained by using different cryo-protective buffer and salt conditions. These variations can be exploited in a systematic fashion for phasing by electron-density averaging, so long as (1) the shrinkage relationships among the different crystals are not merely isotropic and (2) the boundaries and NCS parameters among related segments can be determined. Perutz (Perutz, 1946; Bragg & Perutz, 1952
) recognized the potential utility of such shrinkage stages for crystallographic phasing in studies of haemoglobin crystals with varying degrees of hydration.
Recent examples of structure solutions involving multidomain and multiple-crystal-form averaging include studies of HIV reverse transcriptase (RT) (Ren et al., 1995; Ding et al., 1995
). Studies of HIV RT by Stuart and coworkers involved multidomain and multiple-crystal-form averaging using different soaking solutions (Esnouf et al., 1995
; Ren et al., 1995
), in some cases with dramatically improved diffraction resolution. Arnold and coworkers have applied multidomain and multiple-crystal-form averaging to studies of HIV RT, including a systematic application of averaging electron density between `frozen' and `unfrozen' crystal forms (Ding et al., 1995
; Das et al., 1996
). Tong et al. (1997
) recently described electron-density averaging among multiple closely related crystal forms of the human cytomegalovirus protease that were obtained by treatment of the crystals with different soaking buffers containing differing levels of precipitants, such as salt and polyethylene glycol.
This review hopefully covers most aspects encountered when employing electron-density averaging, yet the authors have drawn liberally from their own experience. There are now a large number of averaging programs and procedures available, some more suitable for structure determinations of proteins with low NCS redundancy and improper relationships (Jones, 1992) and others particularly suitable for high NCS redundancy, such as is encountered in the study of icosahedral viruses. For large structures, phase determination can be a very time-consuming computer operation. Therefore, attempts have been made to parallelize some programs (Cornea-Hasegan et al., 1995
), although this may lead to difficulties in exporting the programs to new and different computers.
Recently described program packages for symmetry averaging have been successfully applied to a number of cases. General program systems for averaging that are well suited to cases with high NCS include ENVelope (Rossmann et al., 1992) and GAP (Jonathan Grimes and David Stuart, unpublished results); these same packages have also been used for multiple-crystal-form averaging and problems with low symmetry. A number of the program packages have been conveniently integrated with interactive computer-graphics programs such as O (Jones et al., 1991
) and most permit molecular-envelope definition by a number of possible approaches. RAVE and MAVE (Kleywegt & Jones, 1994
), programs for graphics-assisted averaging within and between crystal forms, also come with an array of tools for flexible map handling and envelope definition (Kleywegt & Jones, 1996
). The program systems DMMULTI (Cowtan & Main, 1993
) and MAGICSQUASH (Schuller, 1996
), which both derive from the program SQUASH (Zhang, 1993
), can simultaneously apply real-space (symmetry averaging and solvent levelling with or without histogram matching) and reciprocal-space (phase refinement by the Sayre equation) constraints for phase improvement and extension. The advantage of adding phasing by the Sayre equation is greater at higher resolution, but appears to be significant in some cases, even at relatively low resolution (Cowtan & Main, 1993
). MAGICSQUASH has been used to determine a number of structures which required multiple-domain and multiple-crystal-form averaging (Schuller, 1996
). The DEMON/ANGEL package allows noncrystallographic averaging among multiple crystal forms together with solvent flattening and histogram matching (Vellieux et al., 1995
). Other versatile programs for electron-density averaging include AVGSYS (Bolin et al., 1993
) and PHASES (Furey & Swaminathan, 1990,
1997
), both of which have features for facilitating definition and refinement of NCS parameters.
Acknowledgements
We are most grateful to Sharon Wilder and Cheryl Towell for extensive help in creating this manuscript. We are also grateful for decades of financial support by the National Science Foundation and the National Institutes of Health during the development of the techniques reported here.
References



































































