International
Tables for Crystallography Volume F Crystallography of biological macromolecules Edited by M. G. Rossmann and E. Arnold © International Union of Crystallography 2006 |
International Tables for Crystallography (2006). Vol. F. ch. 12.2, pp. 256-262
https://doi.org/10.1107/97809553602060000680 Chapter 12.2. Locating heavy-atom sites
a
Institut für Pharmazeutische Chemie der Philipps-Universität Marburg, Marbacher Weg 6, D-35032 Marburg, Germany, and bMax-Planck-Institut für Biochemie, 82152 Martinsried, Germany In order to obtain phase information from isomorphous replacement (or from anomalous dispersion), it is necessary to locate the atomic positions of the heavy-atom (or anomalous) scatterers. The topics covered in this chapter include: the origin of the phase problem; the Patterson function; difference Fourier maps; treatment of errors; automated search procedures; and special complications such as lack of isomorphism, space-group problems and high levels of substitution. Keywords: Fourier maps; Patterson functions; difference Fourier maps; heavy-atom location; isomorphism; isomorphous replacement; lack of isomorphism; noncrystallographic symmetry; phase problem. |
Once a native data set has been collected, the next task is the solution of the structure. There is one major hurdle: the phase problem. To study objects at the atomic level, we must utilize waves with a wavelength in the ångström range, i.e. X-radiation. X-rays interact with electrons and so provide an image of the electron distribution of the sample. Unfortunately, X-rays are refracted by matter only very weakly, and so it is not possible to construct a lens to view molecules at atomic dimensions.1
As shown in Chapter 2.1
, the diffraction
obtained from an electron-density distribution
is given by
where S is perpendicular to the scattered wave and
; θ is the scattering angle and λ is the wavelength. The diffraction pattern is a Fourier transform of the electron density. If we have a crystal with cell parameters a, b and c, then the Laue diffraction conditions require that S lies on a reciprocal lattice such that
, where
and
are the reciprocal-lattice vectors, and h, k and l are the integer indices of the diffracted beam.
where V represents the volume of the unit cell, and x, y and z are the fractional coordinates within that cell in the directions of a, b and c.
Since the diffraction pattern is a Fourier transform of the electron density, it follows that the electron density is an inverse Fourier transform of the diffraction pattern:
Thus it should be mathematically straightforward to calculate the electron density from the diffraction pattern. This is, unfortunately, not the case. The function describing the diffracted rays is a complex function with a magnitude
and a phase
. The diffraction experiment measures the intensities
, however; the relationship between
and
is:
where
is the complex conjugate of
. The measured intensities are related directly to the magnitudes of the diffracted beams; the phase information, however, is lost (Fig. 12.2.1.1
): this is the origin of the phase problem.
There are essentially four ways of overcoming the phase problem (Fig. 12.2.1.2):
The method of isomorphous replacement, by which the first macromolecular structures were solved (Green et al., 1954), remains the most widely used technique for ab initio structure determination, although the availability of synchrotrons, with their facility for selecting a desired wavelength, and molecular-biology techniques that allow the direct introduction of anomalous scatterers, such as selenium or tellurium, into the protein of interest (Hendrickson et al., 1990
; Budisa et al., 1997
) have proven that multiple anomalous dispersion is an exceptionally powerful technique for the solution of novel structures. Patterson search techniques (Rossmann, 1972
) are ideal if a similar macromolecular structure is already known, while direct methods are more-or-less confined to very high resolution data (Sheldrick, 1990
).
In order to obtain phase information from isomorphous replacement (or from anomalous dispersion), it is necessary to locate the atomic positions of the heavy-atom (or anomalous) scatterers.
Although the set of measured intensities contains no information regarding the phases, the Fourier transform of the intensities, the so-called Patterson function, contains valuable information. Patterson (1934) showed that the inverse Fourier transform of the intensity,
is related to the electron density by
The Patterson function is an autocorrelation function of the density. For every vector u that corresponds to an interatomic vector,
will contain a peak (Fig. 12.2.1.1
). These are some properties of the Patterson function:
|
For simple crystals, the Patterson map can be used to solve the structure directly. For macromolecular structures, the Patterson map provides a vehicle for solving the phase problem.
If the crystal contains rotational symmetry elements, then the cross vectors between and its symmetry mate lie on a plane perpendicular to the symmetry axis – the Harker section (Harker, 1956
). By way of example, the space group
has two symmetry-related positions (Fig. 12.2.2.1
),
![]() | The Patterson map with symmetry. When the crystal unit cell contains more than one molecule, then additional cross vectors will be formed between differing molecules. If these are related by crystallographic symmetry, there is a geometrical relationship between cross peaks. In this diagram, the peaks of Fig. 12.2.1.1 |
Cross vectors between symmetry-related points will therefore have the form i.e. all cross vectors lie on the plane
. For space group
, the general coordinates
give rise to cross vectors
i.e. there are three Harker sections:
,
and
. Peaks occurring on the Harker sections must reduce to a self-consistent set of coordinates (x, y, z), allowing reconstruction of the atomic positions.
If we have two isomorphous (see below) data sets and
, then the difference in the two Patterson functions,
will deliver information about the heavy-atom structure. Such a difference function gives rise to non-negligible peaks arising from interference between the
and
terms, however (Perutz, 1956
). Rossmann (1960
) showed that these interference terms could be reduced through calculation of the modified Patterson function
In the case of a single-site derivative, peaks should occur only at the Harker vectors corresponding to the heavy-atom position. Even so, there is a choice of positions for the heavy atom: e.g., in the case, coordinates
, where ξ, ν and ζ can each take the value 0 or
, will all give rise to the same Harker vectors. This in itself is not a problem, relating to equivalent choices of origin and of handedness, but has important ramifications for multisite derivatives or multiple isomorphous replacement (see below).
If there is more than one site, then there will be two sets of peaks: one set corresponding to the Harker sections (self-vector set) and one set corresponding to the difference vectors between different heavy-atom sites (the cross-vector set). In this case, the choice of one heavy-atom position determines the origin and the handedness to which all other peaks must correspond. Thus, in the
example, only one cross vector will occur for
An alternative to the Harker-vector approach is Patterson-vector superposition (Sheldrick et al., 1993; Richardson & Jacobson, 1987
). The Patterson map contains several images of the structure that have been shifted by interatomic vectors (Fig. 12.2.2.2
). If this structure is relatively simple (as is to be hoped for in a `normal' heavy-atom derivative), then it should be possible to deconvolute the superimposed structures by vector shifts (Buerger, 1959
).
![]() | The vector superposition method. The Patterson map of Fig. 12.2.1.1 |
Once the heavy-atom positions have been found, they can be used to calculate approximate phases and Fourier maps. Ideally, difference Fourier maps calculated with phases from a single site should reveal the other positions determined from the Harker search procedure. This ensures that all heavy-atom positions correspond to a single origin and hand. Similarly, phases calculated from derivative H1 should reveal the heavy-atom structure for derivative H2. Merging and refinement of all phase information will result in a phase set that can be used to solve the structure.
Until now, we have dealt with cases involving perfect data. Although this ideal may now be attainable using MAD techniques, this is not necessarily the usual laboratory situation. In the first place, it is necessary to scale the derivative data to the native
. One of the most common scaling procedures is based on the expected statistical dependence of intensity on resolution (Wilson, 1949
). This may not be particularly accurate when only low-resolution data are available, in which case a scaling through equating the Patterson origin peaks of native and derivative sets may provide better results (Rogers, 1965
).
A model to account for errors in the data, determination of heavy-atom positions etc. was proposed by Blow & Crick (1959), in which all errors are associated with
(Fig. 12.2.4.1
); a more detailed treatment has been provided by Terwilliger & Eisenberg (1987
). Owing to errors, the triangle formed by
,
and
fails to close. The lack of closure error ɛ is a function of the calculated phase angle
:
Once an initial set of heavy-atom positions has been found, it is necessary to refine their parameters (x, y, z, occupancy and thermal parameters). This can be achieved through the minimization of
where E is the estimated error
(Rossmann, 1960
; Terwilliger & Eisenberg, 1983
). This procedure is safest for noncentrosymmetric reflections (φ restricted to 0 or π) if enough are present. Phase refinement is generally monitored by three factors:
for noncentrosymmetric reflections only; acceptable values are between 0.4 and 0.6;
which is useful for monitoring convergence; and the
which should be greater than 1 (if less than 1, then the phase triangle cannot be closed via
).
![]() | The treatment of phase errors. The calculated heavy-atom structure results in a calculated value for both the phase and magnitude of |
The resulting phase probability is given by The phases have a minimum error when the best phase
, i.e. the centroid of the phase distribution,
is used instead of the most probable phase. The quality of the phases is indicated by the figure of merit m, where
A value of 1 for m indicates no phase error, a value of 0.5 represents a phase error of about 60°, while a value of 0 means that all phases are equally probable.
The best Fourier is calculated from where the electron density should have minimal errors.
If the derivative shows a high degree of substitution, then the Harker sections become more difficult to interpret. Furthermore, Terwilliger et al. (1987) have shown that the intrinsic noise in the difference Patterson map increases with increasing heavy-atom substitution. It is at this stage that automated procedures are invaluable.
One such automated procedure is implemented in PROTEIN (Steigemann, 1991). The unit cell is scanned for possible heavy-atom sites; for each search point (x, y, z), all possible Harker vectors are calculated, and the difference-Patterson-map values at these points are summed or multiplied. As the origin peak dominates the Patterson function, this region is set to zero. The resulting correlation map should contain peaks at all possible heavy-atom positions. The peak list can then be used to find a set of consistent heavy-atom locations through a subsequent search for difference vectors (cross vectors) between putative sites. It should be possible to locate all major and minor heavy-atom sites through repetition of this procedure. A similar strategy is adopted in the program HEAVY (Terwilliger et al., 1987
), but sets of heavy-atom sites are ranked according to the probability that the peaks are not random. The program SOLVE (Terwilliger & Berendzen, 1999
) takes this process a stage further, where potential heavy-atom structures are solved and refined to generate an (interpretable) electron density in an automated fashion.
The search method can also be applied in reciprocal space, where the Fourier transform of the trial heavy-atom structure is calculated, and the resulting is compared to the measured differences between derivative and native structure-factor amplitudes (Rossmann et al., 1986
). In the programme XtalView (McRee, 1998
), the correlation coefficient between
and
is calculated, whilst a correlation between
and
is used by Badger & Athay (1998
). Dumas (1994b,c
) calculates the correlation between
and
, based on the estimated lack of isomorphism.
Vagin & Teplyakov (1998) have reported a heavy-atom search based on a reciprocal-space translation function. In this case, low-resolution peaks are not removed but weighted down using a Gaussian function. Potential solutions are ranked not only according to their translation-function height, but also through their phasing power, which appears to be a stronger selection criterion.
All these searches are based upon the sequential identification of heavy-atom sites and their incorporation in a heavy-atom partial structure. Problems arise when bogus sites influence the search for further heavy-atom positions. In an attempt to overcome this problem, the heavy-atom search has been reprogrammed using a genetic algorithm, with the Patterson minimum function as a selection criterion (Chang & Lewis, 1994). This approach has the potential to reveal all heavy-atom positions in one calculation, and tests on model data have shown it to be faster than traditional sequential searches.
This problem is by far the most common in protein crystallography. An isomorphous derivative is one in which the crystalline arrangement has not been disturbed by derivatization. An early study of Crick & Magdoff (1956) proposed a rule of thumb that a change in any of the cell dimensions by more than around 5% would result in a lack of isomorphism that would defeat any attempt to locate the heavy-atom positions or extract useful phase information. Lack of isomorphism can, however, be more subtle; sometimes a natural variation in the native crystal form may occur, resulting in poor merging statistics of data obtained from different crystals. Coupling this variation with commonly observed structural changes upon heavy-atom binding can provide a considerable barrier to obtaining satisfactory phases. Dumas (1994a
) has provided a theoretical consideration of this problem.
One practical approach is to collect native and derivative data sets from the same crystal, a technique that has been successful in the structure determination of cyclohydrolase (Nar et al., 1995), proteosome (Löwe et al., 1995
) and a number of other proteins. Nonisomorphism can be used, however. In the structure solution of carbamoyl sarcosine hydrolase (Romao et al., 1992
), derivatives fell into two (related) crystalline classes. By judicious use of two `native' crystal forms, heavy-atom positions could be obtained in each of the two classes. Phasing and resultant averaging between the two classes provided an interpretable electron density. In the case of ascorbate oxidase (Messerschmidt et al., 1989
), multiple isomorphous replacement failed to provide an interpretable density. It was possible, however, to place the initial density into a second crystal form, which in turn provided phases of sufficient quality to determine heavy-atom sites in derivatives of the second form. Phase-combination and density-modification techniques in the two crystal forms allowed the solution of the structure.
Although the macromolecular crystallographer is rarely confronted with the problems facing their small-molecule colleagues with regard to determining the correct space group, the simplified heavy-atom structure may often throw some surprises. Certain pseudosymmetries may become `exact' for the heavy-atom difference Patterson map. Thus, cross peaks between different heavy atoms may occur on a Harker section (or `pseudo-Harker section'), complicating interpretation of the Patterson map. Such was the case with azurin (Adman et al., 1978; Nar et al., 1991
), where the heavy-atom structure gave rise to a pseudo-homometric Patterson function, i.e. one in which two possible (nonequivalent) choices were available for the heavy-atom structure, only one of which was correct. This arose from a pseudo-centring of the lattice that became almost exact for the heavy-atom structure.
In the case of human NC1 (Stubbs et al., 1990), all heavy-atom derivatives appeared to lie on or near the crystallographic twofold axis. This resulted in a partially centrosymmetric heavy-atom structure that failed to deliver sufficient phase information for noncentrosymmetric reflections. To check for problems with the native data set, anomalous difference Patterson maps {coefficients
were calculated. Coincidence of the peaks obtained from conventional and anomalous Patterson syntheses showed that the heavy-atom positions were correct, but unfortunately did not lead to a structure solution.
Most problematic are the cases where many heavy atoms have become incorporated in the asymmetric unit. Not only does this cause difficulties in the scaling of derivative to native data, but also the large number of peaks results in ambiguities in the solution of the Patterson function. In such cases, it may be necessary to obtain primary phase information from a different source (such as, for example, another low-substitution-site derivative). One important subclass of high-level substitution is when the native asymmetric unit contains several copies of a single molecule (noncrystallographic symmetry or NCS).
A major problem in locating complex noncrystallographic axes is that the geometrical relationship between NCS peaks in the Patterson map is nontrivial. Under certain conditions, NCS results in a recognizable local symmetry within the Patterson map (Stubbs et al., 1996). In many cases, however, these conditions (that the NCS axes of crystallographic symmetry-related molecules are parallel) are not fulfilled. Under such circumstances, all heavy-atom sites (including all crystallographic symmetry-related positions) must be checked carefully with the rotation function in order to pinpoint the NCS axis. This is relatively trivial for low-order NCS (twofold, threefold), but becomes increasingly complicated for higher orders. It should also always be borne in mind that the heavy-atom positions might not necessarily follow the NCS constraints due to crystal packing. If there is reason to suspect that sites are related by local symmetry, then the orientation of this axis can be used in the initial Harker searches; in practice, however, such searches are extremely sensitive to the correct orientation of the axis.
In the case of high-order NCS (such as, e.g., with icosahedral virus structures or symmetric macromolecular complexes), an alternative approach to the usual initial Harker-vector search can be provided by the self-rotation function. Knowledge of the orientation of the NCS axis (from the rotation function) can be used to determine the relative positions of heavy atoms to the NCS axis (Argos & Rossmann, 1976; Arnold et al., 1987
; Tong & Rossmann, 1993
). The orientation can be refined and the resulting peaks can be used as input in a subsequent translation search of the Harker sections.
References







































