Relating data collection to the problem in hand

Dauter, Z.; Wilson, K. S.

doi:10.1107/97809553602060000671

International
Tables for
Crystallography
Volume F
Crystallography of biological macromolecules
Edited by M. G. Rossmann and E. Arnold

pdf | chapter contents | chapter index | related articles

International Tables for Crystallography (2006). Vol. F. ch. 9.1, pp. 192-194 | 1 | 2 |

Section 9.1.13. Relating data collection to the problem in hand

Z. Dauter^a ^* and K. S. Wilson^b

^a National Cancer Institute, Brookhaven National Laboratory, NSLS, Building 725A-X9, Upton, NY 11973, USA, and ^bStructural Biology Laboratory, Department of Chemistry, University of York, York YO10 5DD, England
Correspondence e-mail: dauter@bnl.gov

9.1.13. Relating data collection to the problem in hand

| top | pdf |

The data-collection protocol should be matched to the purposes for which the data are to be used. Different applications present a range of different needs, requiring the intensities (structure-factor amplitudes) to be exploited in different ways. In this section a representative set of applications is outlined in terms of how the tactics and strategies of data collection can vary.

9.1.13.1. Isomorphous-anomalous derivatives

| top | pdf |

The phasing of proteins by isomorphous replacement requires the collection of data from crystals of one or more heavy-atom derivatives of the protein that are isomorphous to the parent native crystal. Preparation of derivatives involves either soaking of native crystals in the heavy-atom solution or co-crystallization with the heavy-atom reagent (Part 12 ). Data collection can be split into two parts. The first step is to establish whether a potential derivative is isomorphous and contains the expected heavy atoms. The second is to collect the data on this derivative to provide the necessary phase information for the native structure factors. The problems of how to utilize the phase information are addressed in Part 12 . Here, strategies applicable to the two steps are described.

Screening of derivatives can be carried out by collecting data to the resolution limits of the crystals. This can consume substantial data-collection resources and lead to irrelevant data that are not from isomorphous crystals or do not contain the anticipated heavy-atom signal. It is preferable to record the minimum data sufficient to identify a potential derivative in order to save time and resources, as many samples may need to be screened. A minimal strategy can exploit some or all of the following protocols:

(1) An essentially complete native-data reference set should be available, although not necessarily to the ultimate resolution limit.
(2) Preparation of a set of crystals with a selected set of potential heavy atoms, the number depending on crystal availability.
(3) Collection of a small number of images from each potential derivative crystal, ideally on the home-laboratory rotating-anode source or an SR beamline if necessary. These data can be recorded to a low resolution: in principle 4 Å or less should be enough. The resulting partial derivative data are scaled with the complete native set. The fractional isomorphous difference can be evaluated easily and compared with the expected agreement with the native data. In general, values less than 10% suggest that the heavy atom is not bound. Values higher than about 30% suggest an unacceptable level of non-isomorphism. Intermediate values suggest, but do not guarantee, that the derivative is worth pursuing. Normal probability plots can be helpful in this respect (Howell & Smith, 1992).
(4) Given a positive result from point (3), complete data may be recorded on the same or an equivalent crystal. Again, it may be useful to record data to low resolution in the first instance. 4 Å resolution is again quite sufficient to solve the structure of a heavy-atom constellation using direct or Patterson methods, allowing the more complete characterization of the potential derivative.
(5) If the compound proves to be a useful derivative, data can then be recorded to higher resolution for the computation of phase information. It may not be appropriate to record data to the highest resolution as for the native protein. In this context, the strength of the data is of primary importance, and relatively weak data at high resolution may be less relevant.

Some practical points are highly relevant here. The ability to store and reuse frozen crystals means that potential derivatives can first be screened at the lowest possible resolution, and the crystal preserved and used later only if the derivative proves to provide useful phase information. The final resolution for data collection will then depend on the degree of isomorphism. The wavelength, if tunable, should be set to a value just below the absorption edge in order to maximize the anomalous signal. The redundancy can also play an important role, as it is useful to have a large number of independent measurements so that outliers in the native or derivative data can be excluded, as these can cause major problems in either the Patterson or direct-methods approaches for locating the heavy atom (Part 12 ).

9.1.13.2. Anomalous scattering, MAD and SAD

| top | pdf |

The requirements for collecting data with an intrinsically weak anomalous signal are several. As with the isomorphous measurements in the previous section, the highest possible resolution may not be the primary consideration. Here the emphasis lies in data quality, as the measurement of very small differences in macromolecular amplitudes, which are already in themselves relatively weak, is required. Important considerations include the following.

(1) Optimization of the wavelength, particularly for MAD experiments.
(2) Ensuring that the anomalous data are complete in terms of all possible Bijvoet pairs. This is not always addressed by the currently available data-processing software.
(3) High redundancy of measurements significantly enhances the quality of the signal, as this provides effective averaging of errors and allows the rejection of statistical outliers. The latter is especially important for direct-methods solution of the anomalous-scattering constellation.

For MAD experiments (Hendrickson, 1991; Smith, 1991), which can only be carried out at SR sites, the optimum number of wavelengths at which data should be recorded remains unclear. The minimum is one (SAD) and the conventional wisdom is that four are optimal. Given finite beam time, the trade-off is between measuring with limited redundancy at several wavelengths as against higher redundancy at a smaller number of wavelengths. The jury is still out on this one.

Single-wavelength anomalous dispersion (SAD) represents the limiting case. All data are recorded at one wavelength, reducing the requirement for fine monochromatization and for fine tunability and stability. Now quality, especially in the form of redundancy, is the dominating factor since all phasing is based purely on a single anomalous difference for each reflection.

9.1.13.3. Molecular replacement

| top | pdf |

For the initial data required for molecular replacement (MR), high resolution is not essential. Firstly, the method depends on homologous models that are usually only an imperfect representation of the structure under investigation and hence high-resolution data cannot be accurately modelled, and will only introduce noise into the analysis. Secondly, the rotation function, the first step in MR, is based on the representation of the Patterson function in terms of spherical harmonics, which is limited in its accuracy.

In contrast, it is essential for MR applications that the most intense low-resolution terms are measured. The lack of such reflections strongly affects the rotation- and translation-function computations, as the functions are based on Patterson syntheses involving the square of the structure-factor amplitudes, and are dominated by the largest terms. Elimination of the strongest few per cent of the low-resolution data may well prevent a successful solution by MR.

However, for refinement of structures solved by MR, it is essential that data be recorded to a resolution sufficient to allow escape from the phase bias introduced by the model.

9.1.13.4. Definitive data on relevant biological structures

| top | pdf |

Here it is intended to include all structures that benefit from the highest accuracy in their atomic coordinates to shed light on the details of their biological function. These may include substrate or inhibitor complexes and mutants if the analysis requires the full potential of X-ray crystallography. Many of these will not diffract to atomic resolution; nevertheless, all steps in a detailed crystal structure analysis are made simpler as the resolution and quality of the data are increased. This includes the solution of the phase problem, interpretation of the electron-density maps and the refinement of the model.

The most appropriate strategy for data collection involves decisions based on a complex and mutually dependent set of parameters including:

(1) Crystal quality and availability. If only one crystal is available, the choices are limited. If many are available, then some experimentation is recommended to select a high-quality sample.
(2) Cryogenic freezing. This has become de rigueur for the modern protein crystallographer. In many cases it allows collection of data from a single crystal. If appropriate cryogenic freezing conditions cannot be established, making it necessary to record room-temperature data, this can affect strategy-making dramatically, in that several crystals might well be required to achieve the target resolution and completeness.
(3) X-ray source and detector. The availability of these again places restrictions on the experiments which are tractable. An SR source will always provide better data, but has logistical problems of availability and access. For some problems, SR becomes sine qua non and a rotating anode is just insufficient. These include the use of MAD techniques, very small crystals, large and complex structures with large unit cells such as viruses, and where atomic resolution data are needed.
(4) Overall data-collection time allocated. This has an obvious overlap with point (3). In particular, if SR is to be used later, then the resolution limit on the home source may be modest. If SR is not likely to be employed, then a higher resolution may be aimed for, requiring more time, and again dependent on the pressure on local resources.

Whatever the resource, it is good to define a strategy that will provide high completeness of the unique amplitudes at the highest resolution, with the realization that there is some conflict between these two requirements.

9.1.13.5. A series of mutant or complex structures

| top | pdf |

The detailed geometry of the molecule is already known and the rather general effects of ligand binding or mutation can be initially identified at a relatively modest resolution and completeness. As with heavy-atom screening, it is often advisable to check that the desired complex or structural modification has been achieved by first recording data at low resolution.

However, if the analysis then proves to be of real chemical interest, with a need for accurate definition of structural features, the data should be subsequently extended in resolution and quality. As with the identification of isomorphous derivatives, this approach has benefited greatly from cryogenic freezing, where the sample can be screened at low resolution and then preserved for subsequent use.

9.1.13.6. Atomic resolution applications

| top | pdf |

As for MAD data, the needs for atomic resolution data are extreme, but rather different in nature. Atomic resolution refinement is addressed in Chapter 18.4 . Suffice it to say that by atomic resolution it is meant that meaningful experimental data extend close to 1 Å resolution. There are two principal reasons for recording such data. Firstly, they allow the refinement of a full anisotropic atomic model, leading to a more complete description of subtle structural features. Secondly, direct methods of phasing are largely dependent upon the principle of atomicity.

The problems likely to be faced include:

(1) The high contrast in intensities between the low- and high-angle reflections. This may be much larger than the dynamic range of the detector. If exposure times are long enough to give good counting statistics at high resolution, then the low-resolution spots will be saturated. The solution is to use more than one pass with different effective times.
(2) The overall exposure time is often considerable and substantial radiation damage may finally result. The completeness of the low-resolution data is crucial, and it is recommended to collect the low-resolution pass first as the time taken for this is relatively small.
(3) The close spacing between adjacent spots within the lunes on the detector, dependent on the cell dimensions. The only aid is to use fine collimation.
(4) The overlap of adjacent lunes at high diffraction angle, especially if a long cell axis lies along the beam direction. Using an alternative mount of the crystal is the simplest solution. Otherwise the rotation range per image must be reduced, increasing the number of exposures. This is again a problem with slow read-out detectors.
(5) For direct-methods applications, a liberal judgement of resolution limit should be adopted. Even a small percentage of meaningful reflections in the outer shells can assist the phasing. These weak shells can be rejected or given appropriate low weights in the refinement. The strong, low-resolution terms are vital for direct methods.

References

Hendrickson, W. A. (1991). Determination of macromolecular structures from anomalous diffraction of synchrotron radiation. Science, 254, 51–58.Google Scholar

Howell, P. L. & Smith, G. D. (1992). Identification of heavy-atom derivatives by normal probability methods. J. Appl. Cryst. 25, 81–86.Google Scholar

Smith, J. L. (1991). Determination of three-dimensional structure by multiwavelength anomalous diffraction. Curr. Opin. Struct. Biol. 1, 1002–1011.Google Scholar

International Tables for Crystallography (2006). Vol. F. ch. 9.1, pp. 192-194