International
Tables for Crystallography Volume F Crystallography of biological macromolecules Edited by M. G. Rossmann and E. Arnold © International Union of Crystallography 2006 
International Tables for Crystallography (2006). Vol. F, ch. 16.1, pp. 339344
Section 16.1.8. Applying dualspace programs successfully^{a}Institut für Anorganisch Chemie, Universität Göttingen, Tammannstrasse 4, D37077 Göttingen, Germany,^{b}Hauptman–Woodward Medical Research Institute, Inc., 73 High Street, Buffalo, NY 142031196, USA, and ^{c}Lehrstuhl für Strukturchemie, Universität Göttingen, Tammannstrasse 4, D37077 Göttingen, Germany 
The solution of the (known) structure of triclinic lysozyme by SHELXD and shortly afterwards by SnB (Deacon et al., 1998) finally broke the 1000atom barrier for direct methods (there happen to be 1001 protein atoms in this structure!). Both programs have also solved a large number of previously unsolved structures that had defeated conventional direct methods; some examples are listed in Table 16.1.8.1. The overall quality of solutions is generally very good, especially if appropriate action is taken during the Fourierrefinement stage. Most of the time, the ShakeandBake method works remarkably well, even for rather large structures. However, in problematic situations, the user needs to be aware of options that can increase the chance of success.
References: [1] Loll et al. (1997); [2] Schäfer et al. (1996); [3] Schäfer (1998); [4] Schäfer, Sheldrick, Bahner & Lackner (1998); [5] Langs (1988); [6] Drouin (1998); [7] Anderson et al. (1996); [8] Schäfer & Prange (1998); [9] Stec et al. (1995); [10] Weeks et al. (1995); [11] Usón et al. (1999); [12] Aree et al. (1999); [13] Prive et al. (1999); [14] Dauter et al. (1992); [15] Loll et al. (1998); [16] Schneider (1998); [17] Reibenspiess (1998); [18] Schäfer, Sheldrick, Schneider & Vértesy (1998); [19] Teichert (1998); [20] Smith et al. (1997); [21] Gessler et al. (1999); [22] Schneider et al. (2000); [23] Parisini et al. (1999); [24] Deacon et al. (1998); [25] Walsh et al. (1998); [26] Frazão et al. (1999); [27] Ekstrom et al. (1999); [28] Li et al. (1999); [29] Radfar et al. (2000); [30] Turner et al. (1998); [31] Deacon & Ealick (1999).

When slightly heavier atoms such as sulfur are present, it is possible to start the ShakeandBake recycling procedure from a set of atomic positions that are consistent with the Patterson function. For large structures, the vectors between such atoms will correspond to Patterson densities around or even below the noise level, so classical methods of locating the positions of these atoms unambiguously from the Patterson are unlikely to succeed. Nevertheless, the Patterson function can still be used to filter sets of starting atoms. This filter is currently implemented as follows in SHELXD. First, a sharpened Patterson function (Sheldrick et al., 1993) is calculated, and the top 200 (for example) nonHarker peaks further than a given minimum distance from the origin are selected, in turn, as twoatom translationsearch fragments, one such fragment being employed per solution attempt. For each of a large number of random translations, all unique Patterson vectors involving the two atoms and their symmetry equivalents are found and sorted in order of increasing Patterson density. The sum of the smallest third of these values is used as a figure of merit (PMF). Tests showed that although the globally highest PMF for a given twoatom search fragment may not correspond to correct atomic positions, nevertheless, by limiting the number of trials, some correct solutions may still be found. After all the vectors have been used as search fragments (e.g. after 200 attempts), the procedure is repeated starting again with the first vector. The two atoms may be used to generate further atoms using a full Patterson superposition minimum function or a weighted difference synthesis (in the current version of SHELXD, a combination of the two is used).
In the case of the small protein BPTI (Schneider, 1998), 15 300 attempts based on 100 different search vectors led to four final solutions with mean phase error less than 18°, although none of the globally highest PMF values for any of the search vectors corresponded to correct solutions. Table 16.1.8.2 shows the effect of using different twoatom search fragments for hirustasin, a previously unsolved 55aminoacid protein containing five disulfide bridges first solved using SHELXD (Usón et al., 1999). It is not clear why some search fragments perform so much better than others; surprisingly, one of the more effective search vectors deviates considerably (1.69 Å) from the nearest true S–S vector.

The frequent imposition of realspace constraints appears to keep dualspace methods from producing most of the false minima that plague practitioners of conventional direct methods. Translated molecules have not been observed (so far), and traditionally problematic structures with polycyclic ring systems and long aliphatic chains are readily solved (McCourt et al., 1996, 1997). False minima of the type that occur primarily in space groups lacking translational symmetry and are characterized by a single large `uranium' peak do occur frequently in P1 and occasionally in other space groups. Triclinic hen eggwhite lysozyme exhibits this phenomenon regardless of whether parametershift or tangentformula phase refinement is employed. An example from another space group (C222) is provided by the Se substructure data for AdoHcy hydrolase. In this case, many trials converge to false minima if the feature in the SnB program that eliminates peaks at special positions is not utilized.
The problem with false minima is most serious if they have a `better' value of the figure of merit being used for diagnostic purposes than do the true solutions. Fortunately, this is not the case with the uranium `solutions', which can be distinguished on the basis of the minimal function [equation (16.1.4.2)] or the correlation coefficient [equation (16.1.6.1)]. However, it would be inefficient to compute the latter in each dualspace cycle since it requires that essentially all reflections be used. To be an effective discriminator, the figure of merit must be computed using the phases calculated from the pointatom model, not from the phases directly after refinement. Phase refinement can and does produce sets of phases, such as the uranium phases, which do not correspond to physical reality. Hence, it should not be surprising that such phase sets might appear `better' than the true phases and could lead to an erroneous choice for the best trial. Peak picking, followed by a structurefactor calculation in which the peaks are sensibly weighted, converts the phase set back to physically allowed values. If the value of the minimal function computed from the refined or unconstrained phases is denoted by and the value of the minimal function computed using the constrained phases resulting from the atomic model is denoted by , then a function defined by can be used to distinguish false minima from other nonsolutions as well as the true solutions. Once a trial falls into a false minimum, it never escapes. Therefore, the R ratio can be used, within SnB, as a criterion for early termination of unproductive trials. Based on data for several P1 structures, it appears that termination of trials with R ratio values exceeding 0.2 will eliminate most false minima without risking rejection of any potential solutions. In the case of triclinic lysozyme, false minima can be recognized, on average, by cycle 25. Since the default recommendation would be for 1000 cycles, a substantial saving in CPU time is realized by using the R ratio earlytermination test. It should be noted that SHELXD optionally allows early termination of trials if the second peak is less than a specified fraction (e.g. 40%) of the height of the first. Generally, but not always, the Rratio and peakratio tests eliminate the same trials.
Recognizing false minima is, of course, only part of the battle. It is also necessary to find a real solution, and essentially 100% of the triclinic lysozyme trials were found to be false minima when the standard parametershift conditions of two 90° shifts were used. In fact, significant numbers of solutions occur only when singleshift angles in the range 140–170° are used (Fig. 16.1.8.1), and there is a surprisingly high success rate (percentage of trial structures that go to solutions) over a narrow range of angles centred about 157.5°. It is also not surprising that there is a correlated decrease in the percentage of false minima in the range 140–150°. This suggests that a fruitful strategy for structures that exhibit a large percentage of false minima would be the following. Run 100 or so trials at each of several shift angles in the range 90–180°, find the smallest angle which gives nearly zero false minima, and then use this angle as a single shift for many trials. Balhimycin is an example of a large nonP1 structure that also requires a parameter shift of around 154° to obtain a solution using the minimal function.
The importance of the presence of several atoms heavier than oxygen for increasing the chance of obtaining a solution by SnB at resolutions less than 1.2 Å was noticed for truncated data from vancomycin and the 289atom structure of conotoxin EpI (Weeks & Miller, 1999b). The results of SHELXD application to hirustasin are consistent with this (Usón et al., 1999). The 55aminoacid protein hirustasin could be solved by SHELXD using either 1.2 Å lowtemperature data or 1.4 Å roomtemperature data; however, as shown in Fig. 16.1.8.2(a), the mean phase error (MPE) is significantly better for the 1.2 Å data over the whole resolution range. The MPE is determined primarily by the datatoparameter ratio, which is reflected in the smaller number of reliable triplet invariants at lower resolution. Although smallmolecule interpretation based on peak positions worked well for the 1.2 Å solution (overall ), standard protein chain tracing was required for the 1.4 Å solution (overall ). As is clear from the corresponding electrondensity map (Fig. 16.1.8.2b), the ShakeandBake procedure produces easily interpreted protein density even when bonded atoms are barely resolved from each other. The hirustasin structure was also determined with SHELXD using 1.55 Å truncated data, and this endeavour currently holds the record for the lowestresolution successful application of ShakeandBake.
The relative effects of accuracy, completeness and resolution on ShakeandBake success rates using SnB for three large P1 structures were studied by computing errorfree data using the known atomic coordinates. The results of these studies, presented in Table 16.1.8.3, show that experimental error contributed nothing of consequence to the low success rates for vancomycin and lysozyme. However, completing the vancomycin data up to the maximum measured resolution of 0.97 Å resulted in a substantial increase in success rate which was further improved to an astounding success rate of 80% when the data were expanded to 0.85 Å.

On account of overload problems, the experimental vancomycin data did not include any data at 10 Å resolution or lower. A total of 4000 reflections were phased in the dualspace loop in the process of solving this structure with the experimental data. Some of these data were then replaced with the largest errorfree magnitudes chosen from the missing reflections at several different resolution limits. The results in Table 16.1.8.4 show a tenfold increase in success rate when only 200 of the largest missing magnitudes were supplied, and it made no difference whether these reflections had a maximum resolution of 2.8 Å or were chosen randomly from the whole 0.97 Å sphere. The moral of this story is that, when collecting data for ShakeandBake, it pays to take a second pass using a shorter exposure to fillin the lowresolution data.

Variations in the computational details of the dualspace loop can make major differences in the efficacy of SnB and SHELXD. Recently, several strategies were combined in SHELXD and applied to a 148atom P1 test structure (Karle et al., 1989) with the results shown in Fig. 16.1.8.3. The CPU time requirements of parametershift (PS) and tangentformula expansion (TE) are similar, both being slower than no phase refinement (NR). In real space, the randomomitmap strategy (RO) was slightly faster than simple peak picking (PP) because fewer atoms were used in the structurefactor calculations. Both of these procedures were much faster than iterative peaklist optimization (PO). The original SHELXD algorithm (TE + PO) performs quite well in comparison with the SnB algorithm (PS + PP) in terms of the percentage of correct solutions, but less well when the efficiency is compared in terms of CPU time per solution. Surprising, the two strategies involving random omit maps (PS + RO and TE + RO), which had been calculated to give reference curves, are much more effective than the other algorithms, especially in terms of CPU efficiency. Indeed these two runs appear to approach a 100% success rate as the number of cycles becomes large. The combination of random omit maps and Karletype tangent expansion appears to be even more effective (Fig. 16.1.8.4) for gramicidin A, a structure (Langs, 1988). It should be noted that conventional direct methods incorporating the tangent formula tend to perform better in than in P1, perhaps because there is less risk of a uraniumatom pseudosolution.
Subsequent tests using SHELXD on several other structures have shown that the use of random omit maps is much more effective than picking the same final number of peaks from the top of the peak list. However, it should be stressed that it is the combination TE + RO that is particularly effective. A possible special case is when a very small number of atoms is sought (e.g. Se atoms from MAD data). Preliminary tests indicate that peaklist optimization (PO) is competitive in such cases because the CPU time penalty associated with it is much smaller than when many atoms are involved.
With hindsight, it is possible to understand why the random omit maps provide such an efficient search algorithm. In macromolecular structure refinement, it is standard practice to omit parts of the model that do not fit the current electron density well, to perform some refinement or simulated annealing (Hodel et al., 1992) on the rest of the model to reduce memory effects, and then to calculate a new weighted electrondensity map (omit map). If the original features reappear in the new density, they were probably correct; in other cases the omit map may enable a new and better interpretation. Thus, random omit maps should not lead to the loss of an essentially correct solution, but enable efficient searching in other cases. It is also interesting to note that the results presented in Figs. 16.1.8.3 and 16.1.8.4 show that it is possible, albeit much less efficiently, to solve both structures using random omit maps without the use of any phase relationships based on probability theory (curves NR + RO).
The results shown in Table 16.1.8.4 and Fig. 16.1.8.3 indicate that success rates in space group P1 can be anomalously high. This suggests that it might be advantageous to expand all structures to P1 and then to locate the symmetry elements afterwards. However, this is more computationally expensive than performing the whole procedure in the true space group, and in practice such a strategy is only competitive in lowsymmetry space groups such as , C2 or (Chang et al., 1997). Expansion to P1 also offers some opportunities for starting from `slightly better than random' phases. One possibility, successfully demonstrated by Sheldrick & Gould (1995), is to use a rotation search for a small fragment (e.g. a short piece of αhelix) to generate many sets of starting phases; after expansion to P1 the translational search usually required for molecular replacement is not needed. Various Patterson superposition minimum functions (Sheldrick & Gould, 1995; Pavelčík, 1994) can also provide an excellent start for phase determination for data expanded to P1. Drendel et al. (1995) were successful in solving small organic structures ab initio by a Fourier recycling method using data expanded to P1 without the use of probability theory.
It has been known for some time that conventional direct methods can be a valuable tool for locating the positions of heavyatom substructures using isomorphous (Wilson, 1978) and anomalous (Mukherjee et al., 1989) difference structure factors. Experience has shown that successful substructure applications are highly dependent on the accuracy of the difference magnitudes. As the technology for producing selenomethioninesubstituted proteins and collecting accurate multiplewavelength (MAD) data has improved (Hendrickson & Ogata, 1997; Smith, 1998), there has been an increased need to locate many selenium sites. For larger structures (e.g. more than about 30 Se atoms), automated Patterson interpretation methods can be expected to run into difficulties since the number of unique peaks to be analysed increases with the square of the number of atoms. Experimentally measured difference data are an approximation to the data for the hypothetical substructure, and it is reasonable to expect that conventional direct methods might run into difficulties sooner when applied to such data. Dualspace direct methods provide a more robust foundation for handling such data, which are often extremely noisy. Dualspace methods also have the added advantage that the expected number of Se atoms, N_{u}, which is usually known, can be exploited directly by picking the top N_{u} peaks. Successful applications require great care in data processing, especially if the values resulting from a MAD experiment are to be used.
All successful applications of SnB to previously unknown SeMet data sets, as reported in Table 16.1.8.1, actually involved the use of peakwavelength anomalous difference data . The amount of data available for substructure problems is much larger than for fullstructure problems with a comparable number of atoms to be located. Consequently, the user can afford to be stringent in eliminating data with uncertain measurements. Guidelines for rejecting uncertain data have been suggested (Smith et al., 1998). Consideration should be limited to those data pairs [i.e., isomorphous pairs and anomalous pairs ] for which and where typically and . The final choice of maximum resolution to be used should be based on inspection of the spherical shell averages versus . The purpose of this precaution is to avoid spuriously large values for highresolution data pairs measured with large uncertainties due to imperfect isomorphism or general falloff of scattering intensity with increasing scattering angle. Only those for which (typically ) should be deemed sufficiently reliable for subsequent phasing. The probability of very large difference 's (e.g. ) is remote, and data sets that appear to have many such measurements should be examined critically for measurement errors. If a few such data remain even after the adoption of rigorous rejection criteria, it may be best to eliminate them individually. A later paper (Blessing & Smith, 1999) elaborates further dataselection criteria.
On the other hand, it is also important that the phase:invariant ratio be maintained at 1:10 in order to ensure that the phases are overdetermined. Since the largest 's for the substructure cell are more widely separated than they are in a true smallmolecule cell, the relative number of possible triplets involving the largest reciprocallattice vectors may turn out to be too small. Consequently, a relatively small number of substructure phases (e.g. 10N_{u}) may not have a sufficient number (i.e., 100N_{u}) of invariants. Since the number of triplets increases rapidly with the number of reflections considered, the appropriate action in such cases is to increase the number of reflections as suggested in Table 16.1.7.1. This will typically produce the desired overdetermination.
It is rare for Se atoms to be closer to each other than 5 Å, and the application of SnB to AdoHcy data truncated to 4 and 5 Å has been successful. Success rates were less for lowerresolution data, but the CPU time required per trial was also reduced, primarily because much smaller Fourier grids were necessary. Consequently, there was no net increase in the CPU time needed to find a solution.
A special version of SHELXD is being developed that makes extensive use of the Patterson function both in generating starting atoms and in providing an independent figure of merit. It has already successfully located the anomalous scatterers in a number of structures using MAD data or simple anomalous differences. A recent example was the unexpected location of 17 anomalous scatterers (sulfur atoms and chloride ions) from the 1.5 Åwavelength anomalous differences of tetragonal HEW lysozyme (Dauter et al., 1999).
References
Blessing, R. H. & Smith, G. D. (1999). Difference structurefactor normalization for heavyatom or anomalousscattering substructure determinations. J. Appl. Cryst. 32, 664–670.Google ScholarChang, C.S., Weeks, C. M., Miller, R. & Hauptman, H. A. (1997). Incorporating tangent refinement in the ShakeandBake formalism. Acta Cryst. A53, 436–444.Google Scholar
Dauter, Z., Dauter, M., de La Fortelle, E., Bricogne, G. & Sheldrick, G. M. (1999). Can anomalous signal of sulfur become a tool for solving protein crystal structures? J. Mol. Biol. 289, 83–92.Google Scholar
Deacon, A. M., Weeks, C. M., Miller, R. & Ealick, S. E. (1998). The ShakeandBake structure determination of triclinic lysozyme. Proc. Natl Acad. Sci. USA, 95, 9284–9289.Google Scholar
Drendel, W. B., Dave, R. D. & Jain, S. (1995). Forced coalescence phasing: a method for ab initio determination of crystallographic phases. Proc. Natl Acad. Sci. USA, 92, 547–551.Google Scholar
Hendrickson, W. A. & Ogata, C. M. (1997). Phase determination from multiwavelength anomalous diffraction measurements. Methods Enzymol. 276, 494–523.Google Scholar
Hodel, A., Kim, S.H. & Brünger, A. T. (1992). Model bias in macromolecular crystal structures. Acta Cryst. A48, 851–858.Google Scholar
Karle, I. L., FlippenAnderson, J. L., Uma, K., Balaram, H. & Balaram, P. (1989). αHelix and mixed 3_{10}/αhelix in cocrystallized conformers of BocAibValAibAibValValValAibValAibOme. Proc. Natl Acad. Sci. USA, 86, 765–769.Google Scholar
Langs, D. A. (1988). Threedimensional structure at 0.86 Å of the uncomplexed form of the transmembrane ion channel peptide gramicidin A. Science, 241, 188–191.Google Scholar
McCourt, M. P., Ashraf, K., Miller, R., Weeks, C. M., Li, N., Pangborn, W. A. & Dorset, D. L. (1997). Xray crystal structures of cytotoxic oxidized cholesterols: 7ketocholesterol and 25hydroxycholesterol. J. Lipid Res. 38, 1014–1021.Google Scholar
McCourt, M. P., Li, N., Pangborn, W., Miller, R., Weeks, C. M. & Dorset, D. L. (1996). Crystallography of linear molecule binary solids. Xray structure of a cholesteryl myristate/cholesteryl pentadecanoate solid solution. J. Phys. Chem. 100, 9842–9847.Google Scholar
Mukherjee, A. K., Helliwell, J. R. & Main, P. (1989). The use of MULTAN to locate the positions of anomalous scatterers. Acta Cryst. A45, 715–718.Google Scholar
Pavelčík, F. (1994). Pattersonoriented automatic structure determination. Deconvolution techniques in space group P1. Acta Cryst. A50, 467–474.Google Scholar
Schneider, T. R. (1998). Personal communication.Google Scholar
Sheldrick, G. M., Dauter, Z., Wilson, K. S., Hope, H. & Sieker, L. C. (1993). The application of direct methods and Patterson interpretation to highresolution native protein data. Acta Cryst. D49, 18–23.Google Scholar
Sheldrick, G. M. & Gould, R. O. (1995). Structure solution by iterative peaklist optimization and tangent expansion in space group P1. Acta Cryst. B51, 423–431.Google Scholar
Smith, G. D., Nagar, B., Rini, J. M., Hauptman, H. A. & Blessing, R. H. (1998). The use of SnB to determine an anomalous scattering substructure. Acta Cryst. D54, 799–804.Google Scholar
Smith, J. L. (1998). Multiwavelength anomalous diffraction in macromolecular crystallography. In Direct methods for solving macromolecular structures, edited by S. Fortier, pp. 211–225. Dordrecht: Kluwer Academic Publishers.Google Scholar
Usón, I., Sheldrick, G. M., de La Fortelle, E., Bricogne, G., di Marco, S., Priestle, J. P., Grütter, M. G. & Mittl, P. R. E. (1999). The 1.2 Å crystal structure of hirustasin reveals the intrinsic flexibility of a family of highly disulphide bridged inhibitors. Structure, 7, 55–63.Google Scholar
Weeks, C. M. & Miller, R. (1999b). Optimizing ShakeandBake for proteins. Acta Cryst. D55, 492–500.Google Scholar
Wilson, K. S. (1978). The application of MULTAN to the analysis of isomorphous derivatives in protein crystallography. Acta Cryst. B34, 1599–1608.Google Scholar