International
Tables for Crystallography Volume F Crystallography of biological macromolecules Edited by M. G. Rossmann and E. Arnold © International Union of Crystallography 2006 
International Tables for Crystallography (2006). Vol. F, ch. 12.2, pp. 256262
https://doi.org/10.1107/97809553602060000680 Chapter 12.2. Locating heavyatom sites^{a}Institut für Pharmazeutische Chemie der PhilippsUniversität Marburg, Marbacher Weg 6, D35032 Marburg, Germany, and ^{b}MaxPlanckInstitut für Biochemie, 82152 Martinsried, Germany In order to obtain phase information from isomorphous replacement (or from anomalous dispersion), it is necessary to locate the atomic positions of the heavyatom (or anomalous) scatterers. The topics covered in this chapter include: the origin of the phase problem; the Patterson function; difference Fourier maps; treatment of errors; automated search procedures; and special complications such as lack of isomorphism, spacegroup problems and high levels of substitution. 
Once a native data set has been collected, the next task is the solution of the structure. There is one major hurdle: the phase problem. To study objects at the atomic level, we must utilize waves with a wavelength in the ångström range, i.e. Xradiation. Xrays interact with electrons and so provide an image of the electron distribution of the sample. Unfortunately, Xrays are refracted by matter only very weakly, and so it is not possible to construct a lens to view molecules at atomic dimensions.^{1}
As shown in Chapter 2.1 , the diffraction obtained from an electrondensity distribution is given by where S is perpendicular to the scattered wave and ; θ is the scattering angle and λ is the wavelength. The diffraction pattern is a Fourier transform of the electron density. If we have a crystal with cell parameters a, b and c, then the Laue diffraction conditions require that S lies on a reciprocal lattice such that , where and are the reciprocallattice vectors, and h, k and l are the integer indices of the diffracted beam. where V represents the volume of the unit cell, and x, y and z are the fractional coordinates within that cell in the directions of a, b and c.
Since the diffraction pattern is a Fourier transform of the electron density, it follows that the electron density is an inverse Fourier transform of the diffraction pattern:
Thus it should be mathematically straightforward to calculate the electron density from the diffraction pattern. This is, unfortunately, not the case. The function describing the diffracted rays is a complex function with a magnitude and a phase . The diffraction experiment measures the intensities , however; the relationship between and is: where is the complex conjugate of . The measured intensities are related directly to the magnitudes of the diffracted beams; the phase information, however, is lost (Fig. 12.2.1.1): this is the origin of the phase problem.
There are essentially four ways of overcoming the phase problem (Fig. 12.2.1.2):
The method of isomorphous replacement, by which the first macromolecular structures were solved (Green et al., 1954), remains the most widely used technique for ab initio structure determination, although the availability of synchrotrons, with their facility for selecting a desired wavelength, and molecularbiology techniques that allow the direct introduction of anomalous scatterers, such as selenium or tellurium, into the protein of interest (Hendrickson et al., 1990; Budisa et al., 1997) have proven that multiple anomalous dispersion is an exceptionally powerful technique for the solution of novel structures. Patterson search techniques (Rossmann, 1972) are ideal if a similar macromolecular structure is already known, while direct methods are moreorless confined to very high resolution data (Sheldrick, 1990).
In order to obtain phase information from isomorphous replacement (or from anomalous dispersion), it is necessary to locate the atomic positions of the heavyatom (or anomalous) scatterers.
Although the set of measured intensities contains no information regarding the phases, the Fourier transform of the intensities, the socalled Patterson function, contains valuable information. Patterson (1934) showed that the inverse Fourier transform of the intensity, is related to the electron density by
The Patterson function is an autocorrelation function of the density. For every vector u that corresponds to an interatomic vector, will contain a peak (Fig. 12.2.1.1). These are some properties of the Patterson function:

For simple crystals, the Patterson map can be used to solve the structure directly. For macromolecular structures, the Patterson map provides a vehicle for solving the phase problem.
If the crystal contains rotational symmetry elements, then the cross vectors between and its symmetry mate lie on a plane perpendicular to the symmetry axis – the Harker section (Harker, 1956). By way of example, the space group has two symmetryrelated positions (Fig. 12.2.2.1),

The Patterson map with symmetry. When the crystal unit cell contains more than one molecule, then additional cross vectors will be formed between differing molecules. If these are related by crystallographic symmetry, there is a geometrical relationship between cross peaks. In this diagram, the peaks of Fig. 12.2.1.1 are supplemented by those between atoms of symmetryrelated molecules. The red, yellow and blue peaks of the resulting Patterson function represent those between same atoms (i.e. red to red, yellow to yellow and blue to blue) related by symmetry. These peaks are found on a Harker section. 
Cross vectors between symmetryrelated points will therefore have the form i.e. all cross vectors lie on the plane . For space group , the general coordinates give rise to cross vectors i.e. there are three Harker sections: , and . Peaks occurring on the Harker sections must reduce to a selfconsistent set of coordinates (x, y, z), allowing reconstruction of the atomic positions.
If we have two isomorphous (see below) data sets and , then the difference in the two Patterson functions, will deliver information about the heavyatom structure. Such a difference function gives rise to nonnegligible peaks arising from interference between the and terms, however (Perutz, 1956). Rossmann (1960) showed that these interference terms could be reduced through calculation of the modified Patterson function
In the case of a singlesite derivative, peaks should occur only at the Harker vectors corresponding to the heavyatom position. Even so, there is a choice of positions for the heavy atom: e.g., in the case, coordinates , where ξ, ν and ζ can each take the value 0 or , will all give rise to the same Harker vectors. This in itself is not a problem, relating to equivalent choices of origin and of handedness, but has important ramifications for multisite derivatives or multiple isomorphous replacement (see below).
If there is more than one site, then there will be two sets of peaks: one set corresponding to the Harker sections (selfvector set) and one set corresponding to the difference vectors between different heavyatom sites (the crossvector set). In this case, the choice of one heavyatom position determines the origin and the handedness to which all other peaks must correspond. Thus, in the example, only one cross vector will occur for
An alternative to the Harkervector approach is Pattersonvector superposition (Sheldrick et al., 1993; Richardson & Jacobson, 1987). The Patterson map contains several images of the structure that have been shifted by interatomic vectors (Fig. 12.2.2.2). If this structure is relatively simple (as is to be hoped for in a `normal' heavyatom derivative), then it should be possible to deconvolute the superimposed structures by vector shifts (Buerger, 1959).

The vector superposition method. The Patterson map of Fig. 12.2.1.1 can be regarded as the superposition of the structure (and its inverse), with each of its atoms placed alternately at the origin. By shifting each peak of the Patterson function to the origin and calculating the correlation of all remaining peaks with the unshifted map, it is possible to deconvolute the Patterson function. 
Once the heavyatom positions have been found, they can be used to calculate approximate phases and Fourier maps. Ideally, difference Fourier maps calculated with phases from a single site should reveal the other positions determined from the Harker search procedure. This ensures that all heavyatom positions correspond to a single origin and hand. Similarly, phases calculated from derivative H1 should reveal the heavyatom structure for derivative H2. Merging and refinement of all phase information will result in a phase set that can be used to solve the structure.
Until now, we have dealt with cases involving perfect data. Although this ideal may now be attainable using MAD techniques, this is not necessarily the usual laboratory situation. In the first place, it is necessary to scale the derivative data to the native . One of the most common scaling procedures is based on the expected statistical dependence of intensity on resolution (Wilson, 1949). This may not be particularly accurate when only lowresolution data are available, in which case a scaling through equating the Patterson origin peaks of native and derivative sets may provide better results (Rogers, 1965).
A model to account for errors in the data, determination of heavyatom positions etc. was proposed by Blow & Crick (1959), in which all errors are associated with (Fig. 12.2.4.1); a more detailed treatment has been provided by Terwilliger & Eisenberg (1987). Owing to errors, the triangle formed by , and fails to close. The lack of closure error is a function of the calculated phase angle : Once an initial set of heavyatom positions has been found, it is necessary to refine their parameters (x, y, z, occupancy and thermal parameters). This can be achieved through the minimization of where E is the estimated error (Rossmann, 1960; Terwilliger & Eisenberg, 1983). This procedure is safest for noncentrosymmetric reflections (ϕ restricted to 0 or π) if enough are present. Phase refinement is generally monitored by three factors: for noncentrosymmetric reflections only; acceptable values are between 0.4 and 0.6; which is useful for monitoring convergence; and the which should be greater than 1 (if less than 1, then the phase triangle cannot be closed via ).

The treatment of phase errors. The calculated heavyatom structure results in a calculated value for both the phase and magnitude of (red). According to the value of , the triangle –– will fail to close by an amount , the lack of closure (green). This gives rise to a phase distribution which is bimodal for a single derivative. The combined probability from a series of derivatives has a most probable phase (the maximum) and a best phase (the centroid of the distribution), for which the overall phase error is minimum. 
The resulting phase probability is given by The phases have a minimum error when the best phase , i.e. the centroid of the phase distribution, is used instead of the most probable phase. The quality of the phases is indicated by the figure of merit m, where A value of 1 for m indicates no phase error, a value of 0.5 represents a phase error of about 60°, while a value of 0 means that all phases are equally probable.
The best Fourier is calculated from where the electron density should have minimal errors.
If the derivative shows a high degree of substitution, then the Harker sections become more difficult to interpret. Furthermore, Terwilliger et al. (1987) have shown that the intrinsic noise in the difference Patterson map increases with increasing heavyatom substitution. It is at this stage that automated procedures are invaluable.
One such automated procedure is implemented in PROTEIN (Steigemann, 1991). The unit cell is scanned for possible heavyatom sites; for each search point (x, y, z), all possible Harker vectors are calculated, and the differencePattersonmap values at these points are summed or multiplied. As the origin peak dominates the Patterson function, this region is set to zero. The resulting correlation map should contain peaks at all possible heavyatom positions. The peak list can then be used to find a set of consistent heavyatom locations through a subsequent search for difference vectors (cross vectors) between putative sites. It should be possible to locate all major and minor heavyatom sites through repetition of this procedure. A similar strategy is adopted in the program HEAVY (Terwilliger et al., 1987), but sets of heavyatom sites are ranked according to the probability that the peaks are not random. The program SOLVE (Terwilliger & Berendzen, 1999) takes this process a stage further, where potential heavyatom structures are solved and refined to generate an (interpretable) electron density in an automated fashion.
The search method can also be applied in reciprocal space, where the Fourier transform of the trial heavyatom structure is calculated, and the resulting is compared to the measured differences between derivative and native structurefactor amplitudes (Rossmann et al., 1986). In the programme XtalView (McRee, 1998), the correlation coefficient between and is calculated, whilst a correlation between and is used by Badger & Athay (1998). Dumas (1994b,c) calculates the correlation between and , based on the estimated lack of isomorphism.
Vagin & Teplyakov (1998) have reported a heavyatom search based on a reciprocalspace translation function. In this case, lowresolution peaks are not removed but weighted down using a Gaussian function. Potential solutions are ranked not only according to their translationfunction height, but also through their phasing power, which appears to be a stronger selection criterion.
All these searches are based upon the sequential identification of heavyatom sites and their incorporation in a heavyatom partial structure. Problems arise when bogus sites influence the search for further heavyatom positions. In an attempt to overcome this problem, the heavyatom search has been reprogrammed using a genetic algorithm, with the Patterson minimum function as a selection criterion (Chang & Lewis, 1994). This approach has the potential to reveal all heavyatom positions in one calculation, and tests on model data have shown it to be faster than traditional sequential searches.
This problem is by far the most common in protein crystallography. An isomorphous derivative is one in which the crystalline arrangement has not been disturbed by derivatization. An early study of Crick & Magdoff (1956) proposed a rule of thumb that a change in any of the cell dimensions by more than around 5% would result in a lack of isomorphism that would defeat any attempt to locate the heavyatom positions or extract useful phase information. Lack of isomorphism can, however, be more subtle; sometimes a natural variation in the native crystal form may occur, resulting in poor merging statistics of data obtained from different crystals. Coupling this variation with commonly observed structural changes upon heavyatom binding can provide a considerable barrier to obtaining satisfactory phases. Dumas (1994a) has provided a theoretical consideration of this problem.
One practical approach is to collect native and derivative data sets from the same crystal, a technique that has been successful in the structure determination of cyclohydrolase (Nar et al., 1995), proteosome (Löwe et al., 1995) and a number of other proteins. Nonisomorphism can be used, however. In the structure solution of carbamoyl sarcosine hydrolase (Romao et al., 1992), derivatives fell into two (related) crystalline classes. By judicious use of two `native' crystal forms, heavyatom positions could be obtained in each of the two classes. Phasing and resultant averaging between the two classes provided an interpretable electron density. In the case of ascorbate oxidase (Messerschmidt et al., 1989), multiple isomorphous replacement failed to provide an interpretable density. It was possible, however, to place the initial density into a second crystal form, which in turn provided phases of sufficient quality to determine heavyatom sites in derivatives of the second form. Phasecombination and densitymodification techniques in the two crystal forms allowed the solution of the structure.
Although the macromolecular crystallographer is rarely confronted with the problems facing their smallmolecule colleagues with regard to determining the correct space group, the simplified heavyatom structure may often throw some surprises. Certain pseudosymmetries may become `exact' for the heavyatom difference Patterson map. Thus, cross peaks between different heavy atoms may occur on a Harker section (or `pseudoHarker section'), complicating interpretation of the Patterson map. Such was the case with azurin (Adman et al., 1978; Nar et al., 1991), where the heavyatom structure gave rise to a pseudohomometric Patterson function, i.e. one in which two possible (nonequivalent) choices were available for the heavyatom structure, only one of which was correct. This arose from a pseudocentring of the lattice that became almost exact for the heavyatom structure.
In the case of human NC1 (Stubbs et al., 1990), all heavyatom derivatives appeared to lie on or near the crystallographic twofold axis. This resulted in a partially centrosymmetric heavyatom structure that failed to deliver sufficient phase information for noncentrosymmetric reflections. To check for problems with the native data set, anomalous difference Patterson maps {coefficients were calculated. Coincidence of the peaks obtained from conventional and anomalous Patterson syntheses showed that the heavyatom positions were correct, but unfortunately did not lead to a structure solution.
Most problematic are the cases where many heavy atoms have become incorporated in the asymmetric unit. Not only does this cause difficulties in the scaling of derivative to native data, but also the large number of peaks results in ambiguities in the solution of the Patterson function. In such cases, it may be necessary to obtain primary phase information from a different source (such as, for example, another lowsubstitutionsite derivative). One important subclass of highlevel substitution is when the native asymmetric unit contains several copies of a single molecule (noncrystallographic symmetry or NCS).
A major problem in locating complex noncrystallographic axes is that the geometrical relationship between NCS peaks in the Patterson map is nontrivial. Under certain conditions, NCS results in a recognizable local symmetry within the Patterson map (Stubbs et al., 1996). In many cases, however, these conditions (that the NCS axes of crystallographic symmetryrelated molecules are parallel) are not fulfilled. Under such circumstances, all heavyatom sites (including all crystallographic symmetryrelated positions) must be checked carefully with the rotation function in order to pinpoint the NCS axis. This is relatively trivial for loworder NCS (twofold, threefold), but becomes increasingly complicated for higher orders. It should also always be borne in mind that the heavyatom positions might not necessarily follow the NCS constraints due to crystal packing. If there is reason to suspect that sites are related by local symmetry, then the orientation of this axis can be used in the initial Harker searches; in practice, however, such searches are extremely sensitive to the correct orientation of the axis.
In the case of highorder NCS (such as, e.g., with icosahedral virus structures or symmetric macromolecular complexes), an alternative approach to the usual initial Harkervector search can be provided by the selfrotation function. Knowledge of the orientation of the NCS axis (from the rotation function) can be used to determine the relative positions of heavy atoms to the NCS axis (Argos & Rossmann, 1976; Arnold et al., 1987; Tong & Rossmann, 1993). The orientation can be refined and the resulting peaks can be used as input in a subsequent translation search of the Harker sections.
References
Adman, E. T., Stenkamp, R. E., Sieker, L. C. & Jensen, L. H. (1978). A crystallographic model for azurin at 3.0 Å resolution. J. Mol. Biol. 123, 35–47.Google ScholarArgos, P. & Rossmann, M. G. (1976). A method to determine heavyatom positions for virus structures. Acta Cryst. B32, 2975–2983.Google Scholar
Arnold, E., Vriend, G., Luo, M., Griffith, J. P., Kamer, G., Erickson, J. W., Johnson, J. E. & Rossmann, M. G. (1987). The structure determination of a common cold virus, human rhinovirus 14. Acta Cryst. A43, 346–361.Google Scholar
Badger, J. & Athay, R. (1998). Automated and graphical methods for locating heavyatom sites for isomorphous replacement and multiwavelength anomalous diffraction phase determination. J. Appl. Cryst. 31, 270–274.Google Scholar
Blow, D. M. & Crick, F. H. C. (1959). The treatment of errors in the isomorphous replacement method. Acta Cryst. 12, 794–802.Google Scholar
Budisa, N., Karnbrock, W., Steinbacher, S., Humm, A., Prade, L., Neuefeind, T., Moroder, L. & Huber, R. (1997). Bioincorporation of telluromethionine into proteins: a promising new approach for Xray structure analysis of proteins. J. Mol. Biol. 270, 616–623.Google Scholar
Buerger, M. J. (1959). Vector space. New York: Wiley.Google Scholar
Chang, G. & Lewis, M. (1994). Using genetic algorithms for solving heavyatom sites. Acta Cryst. D50, 667–674.Google Scholar
Crick, F. H. C. & Magdoff, B. S. (1956). The theory of the method of isomorphous replacement for protein crystals. I. Acta Cryst. 9, 901–908.Google Scholar
Dumas, P. (1994a). The heavyatom problem: a statistical analysis. I. A priori determination of best scaling, level of substitution, lack of isomorphism and phasing power. Acta Cryst. A50, 526–537.Google Scholar
Dumas, P. (1994b). The heavyatom problem: a statistical analysis. II. Consequences of the a priori knowledge of the noise and heavyatom powers and use of a correlation function for heavyatomsite determination. Acta Cryst. A50, 537–546.Google Scholar
Dumas, P. (1994c). The heavyatom problem: a statistical analysis. II. Consequences of the a priori knowledge of the noise and heavyatom powers and use of a correlation function for heavyatomsite determination. Erratum. Acta Cryst. A50, 793.Google Scholar
Green, D. W., Ingram, V. M. & Perutz, M. F. (1954). The structure of haemoglobin IV. Sign determination by the isomorphous replacement method. Proc. R. Soc. London Ser. A, 225, 287–307.Google Scholar
Harker, D. (1956). The determination of the phases of the structure factors of noncentrosymmetric crystals by the method of double isomorphous replacement. Acta Cryst. 9, 1–9.Google Scholar
Hendrickson, W. A., Horton, J. R. & LeMaster, D. M. (1990). Selenomethionyl proteins produced for analysis by multiwavelength anomalous diffraction (MAD): a vehicle for direct determination of threedimensional structure. EMBO J. 9, 1665–1672.Google Scholar
Löwe, J., Stock, D., Jap, B., Zwickl, P., Baumeister, W. & Huber, R. (1995). Crystal structure of the 20S proteosome from the archaeon T. acidophilum at 3.4 Å resolution. Science, 268, 533–539.Google Scholar
McRee, D. E. (1998). Practical protein crystallography. San Diego: Academic Press.Google Scholar
Messerschmidt, A., Rossi, A., Ladenstein, R., Huber, R., Bolognesi, M., Gatti, G., Marchesini, A., Petruzzelli, R. & FinazziAgro, A. (1989). Xray crystal structure of the blue oxidase ascorbate oxidase from zucchini. Analysis of the polypeptide fold and a model of the copper sites and ligands. J. Mol. Biol. 206, 513–529.Google Scholar
Nar, H., Huber, R., Meining, W., Schmid, C., Weinkauf, S. & Bacher, A. (1995). Atomic structure of GTP cyclohydrolase I. Structure, 3, 459–466.Google Scholar
Nar, H., Messerschmidt, A., Huber, R., van de Kamp, M. & Canters, G. W. (1991). Xray crystal structure of the two sitespecific mutants His35Gln and His235Leu of azurin from Pseudomonas aeruginosa. J. Mol. Biol. 218, 427–447.Google Scholar
Patterson, A. L. (1934). A Fourier series method for the determination of the components of interatomic distances in crystals. Phys. Rev. 46, 372–376.Google Scholar
Perutz, M. F. (1956). Isomorphous replacement and phase determination in noncentrosymmetric space groups. Acta Cryst. 9, 867–873.Google Scholar
Richardson, J. W. & Jacobson, R. A. (1987). In Patterson and Pattersons, edited by J. P. Glusker, B. K. Patterson & M. Rossi. Oxford University Press.Google Scholar
Rogers, D. (1965). In Computing methods in crystallography, edited by J. S. Rollett, pp. 133–148. Oxford University Press.Google Scholar
Romao, M. J., Turk, D., GomisRuth, F. X., Huber, R., Schumacher, G., Mollering, H. & Russmann, L. (1992). Crystal structure analysis, refinement and enzymatic reaction mechanism of Ncarbamoylsarcosine amidohydrolase from Arthrobacter sp. at 2.0 Å resolution. J. Mol. Biol. 226, 1111–1130.Google Scholar
Rossmann, M. G. (1960). The accurate determination of the position and shape of heavyatom replacement groups in proteins. Acta Cryst. 13, 221–226.Google Scholar
Rossmann, M. G. (1972). Editor. The molecular replacement method. New York: Gordon and Breach. Google Scholar
Rossmann, M. G., Arnold, E. & Vriend, G. (1986). Comparison of vector search and feedback methods for finding heavyatom sites in isomorphous derivatives. Acta Cryst. A42, 325–334.Google Scholar
Sheldrick, G. M. (1990). Phase annealing in SHELX90: direct methods for larger structures. Acta Cryst. A46, 467–473.Google Scholar
Sheldrick, G. M., Dauter, Z., Wilson, K. S., Hope, H. & Sieker, L. C. (1993). The application of direct methods and Patterson interpretation to highresolution native protein data. Acta Cryst. D49, 18–23.Google Scholar
Steigemann, W. (1991). Recent advances in the PROTEIN program system for the Xray structure analysis of biological macromolecules. In Crystallographic computing 5: from chemistry to biology, edited by D. Moras, A. D. Podjarny & J. C. Thierry, pp. 115–125. Oxford University Press.Google Scholar
Stubbs, M. T., Nar, H., Löwe, J. , Huber, R., Ladenstein, R., Spangfort, M. D. & Svensson, L. A. (1996). Locating a local symmetry axis from Patterson map cross vectors: application to crystal data from GroEL, GTP cyclohydrolase I and the proteosome. Acta Cryst. D52, 447–452.Google Scholar
Stubbs, M. T., Summers, L., Mayr, I., Schneider, M., Bode, W., Huber, R., Ries, A. & Kühn, K. (1990). Crystals of the NC1 domain of type IV collagen. J. Mol. Biol. 211, 683–684.Google Scholar
Terwilliger, T. C. & Berendzen, J. (1999). Automated MAD and MIR structure solution. Acta Cryst. D55, 849–861.Google Scholar
Terwilliger, T. C. & Eisenberg, D. (1983). Unbiased threedimensional refinement of heavyatom parameters by correlation of originremoved Patterson functions. Acta Cryst. A39, 813–817.Google Scholar
Terwilliger, T. C. & Eisenberg, D. (1987). Isomorphous replacement: effects of errors on the phase probability distribution. Acta Cryst. A43, 6–13.Google Scholar
Terwilliger, T. C., Kim, S.H. & Eisenberg, D. (1987). Generalized method of determining heavyatom positions using the difference Patterson function. Acta Cryst. A43, 1–5.Google Scholar
Tong, L. & Rossmann, M. G. (1993). Pattersonmap interpretation with noncrystallographic symmetry. J. Appl. Cryst. 26, 15–21.Google Scholar
Vagin, A. & Teplyakov, A. (1998). A translationfunction approach for heavyatom location in macromolecular crystallography. Acta Cryst. D54, 400–402.Google Scholar
Wilson, A. J. C. (1949). The probability distribution of Xray intensities. Acta Cryst. 2, 318–321.Google Scholar