International
Tables for Crystallography Volume F Crystallography of biological macromolecules Edited by M. G. Rossmann and E. Arnold © International Union of Crystallography 2006 |
International Tables for Crystallography (2006). Vol. F. ch. 9.1, pp. 190-191
Section 9.1.11. Rotation method: qualitative factors
a
National Cancer Institute, Brookhaven National Laboratory, NSLS, Building 725A-X9, Upton, NY 11973, USA, and bStructural Biology Laboratory, Department of Chemistry, University of York, York YO10 5DD, England |
Reflection profiles should be checked on the first recorded images. Very often a quick inspection of the profiles can disqualify a bad crystal without further loss of time. The profiles should have a single maximum and smooth shoulders. If the crystal shape is irregular, it may be reflected in the spot profile. Profiles should not have double maxima or be substantially elongated or smeared out, which usually arises from crystal splitting. The profiles should certainly be inspected if initial autoindexing of the diffraction pattern is unsuccessful.
Even if the spot profiles appear to be regular on the first image, it is good practice to inspect a second image at a substantially different φ rotation angle, preferably 90° away, since crystal splitting may have a similar effect on the appearance of the lunes and profiles as does high mosaicity on a single image (Section 9.1.6.3). High mosaicity and splitting (often incorrectly referred to as twinning) must not be confused. If two parts of a split crystal are slightly rotated with respect to one another around a certain axis, the diffraction patterns will look different depending on the orientation. When such an axis is perpendicular to the detector plane, the spots will be doubled or smeared out. When the axis is parallel to the detector plane, the profiles resulting from the two parts of the crystal will overlap almost perfectly, but the lunes will be broadened, similar to the effect of high mosaicity.
After indexing the diffraction pattern, the integration profiles should be matched with the size and shape of the diffraction spots. The spots should not extend into the area defined as background. Selection of integration profiles that are too small will lead to incorrect integration of intensities. In contrast, if the profile areas are too large then the standard uncertainties will be wrongly estimated.
According to the principles of counting statistics, the longer the exposure, the better the signal in the data. The standard uncertainty of the measurement is equal to the square root of the number of counts, and the signal-to-noise ratio increases with the accumulated counts. In practice there are limitations to this rule.
The dynamic range and saturation limit of the detector is one limiting factor. It may be impossible to measure adequately the strongest as well as the weakest reflection simultaneously, since their intensities differ by several orders of magnitude. If the exposure time is long enough to record the weakest intensities, then in general at low resolution the most intense reflections may saturate some pixels within their profile on the detector. Such reflections are termed `overloads' and this problem will be addressed in Section 9.1.11.3.
Exposure time can be limited by the total time available for the experiment. This is often a particularly acute problem for synchrotron-data collection, with high oversubscription of beamlines. The decisions concerning exposure time depend on the expected application of the data, since different applications have different requirements, as addressed in Section 9.1.13. Within the given time constraints, the first priority should be data completeness, even at the expense of underexposure. In this context it is useful to recall that to increase the statistical signal-to-noise ratio by a factor of two, it is necessary to prolong the exposure time by at least a factor of four.
Some detectors, or their associated read-out systems, are limited in the number of counts they can accumulate in one pixel. The number recorded reaches a maximum number which cannot be further increased, i.e. the pixels can become saturated. This means that these pixels retain the same maximum value on longer exposure whilst other, non-saturated, pixels continue to accumulate counts. The intensity in saturated pixels will hence be underestimated compared to the others and any intensities estimated from profiles including such pixels will be biased towards low values. It is essential that pixels that are saturated are flagged and recognized by the processing software. There are several ways to deal with the problem of saturation.
Clearly the optimum solution is to have a detector with a sufficient dynamic range to cover pixels of both weak and strong reflections. The dynamic range has already been increased with recent imaging plates and CCDs. Enhanced dynamic range may prove to be the most important advance of solid-state pixel detectors.
An additional advantage of the fine-slicing approach is that it leads to fewer overloads. Each reflection profile is divided between several separate images and as a result the effective dynamic range of the detector is increased.
It is customary to judge data quality by the overall , calculated using the squares of the structure-factor amplitudes (intensities): provides a measure of the distribution of symmetry-equivalent observed intensities. However, the most popular form of given above is not a proper, statistically valid quantifier. It does not take into account the multiplicity of the measurements and, as a consequence, it actually rises with increased multiplicity, falsely indicating degradation of the data quality when in reality they have a higher accuracy. Modifications of have been proposed to include the effect of multiple measurements properly (Diederichs & Karplus, 1997; Weiss & Hilgenfeld, 1997).
A better quantity for assessing the quality of the X-ray data is the ratio, provided the standard uncertainties, , are correctly estimated. Detectors such as imaging plates or CCDs do not measure individual X-ray quanta directly, having a gain factor dependent on the response of the individual detector pixel to a single X-ray photon. If the gain factor is not known accurately for a particular detector, the resulting standard uncertainties of the measured intensities will be estimated at an incorrect level. If the multiplicity of the reflections is higher than unity, it is possible to correct the uncertainties a posteriori. This can be done either from a comparison with the expected values using the test, or by using the t-plot. The latter requires that the ratio of the differences between equivalent intensity measurements to their standard uncertainties, , follows a normal distribution with a mean of 0.0 and standard uncertainty of 1.0. Both of these methods assume the errors have a normal distribution, and that only the mean and width have been incorrectly estimated and should be appropriately adjusted. They cannot take into account systematic errors of measurement.
The data-merging procedure in addition allows the identification of statistical `outliers' and their exclusion from the data (Read, 1999). Outliers are defined as those observations that lie sufficiently far from the mean of a set, and assumption of a normal distribution suggests they suffer from substantial systematic errors of measurement. In a crystallographic experiment, outliers are those intensity measurements that deviate unexpectedly from the mean intensity of a set of symmetry-equivalent reflections. In the recording of rotation data, one typical source of such systematic errors is erroneous classification of reflections predicted as partially or fully recorded. This is a severe problem for those reflections lying close to the blind region. A second example is the presence of so-called `zingers' in individual CCD detector pixels caused by scintillations from trace radioactivity of the taper glass. Other problems such as shadowed or inactive regions of the detector window give rise to a range of such systematic errors.
A small number of outliers may be expected from such causes. However, the total fraction of reflections flagged as outliers and rejected from the merging process should be small, certainly much less than 1%. Larger fractions indicate serious deficiencies in the hardware or the software and suggest something is very wrong with the experiment. There should always be a physical reason for rejecting outliers, other than just a need to reject those agreeing poorly with their symmetry-equivalent intensities in order to drive down . It is always possible to reduce and to provide an apparent `improvement' in the data by rejecting a large percentage of measurements, but this is extremely bad practice.
Good crystallographic data depend strongly on an appropriate statistical procedure. It is also inappropriate to exclude those reflections with intensities lower than a cutoff limit, such as 1σ, before or during the process of data merging. Weak intensities also carry information and their neglect introduces bias into the measured intensity distribution, affecting, for example, the overall or individual atomic temperature factors.
The true outer resolution limit of the diffraction pattern is not trivial to define and indeed depends to some extent on the application. If is higher than 1.0, then a resolution shell of data indeed contains some information in a statistical sense – provided of course that has been correctly estimated. However, as falls close to unity there will in practice be very few significant observations amongst a great deal of noise. It is necessary to make some decision about where to cut the effective resolution. For the application of direct methods, for example using SHELXS (Sheldrick, 1990), the cutoff is often defined as the resolution shell where falls to 2.0, when usually reaches 20–40% depending on the symmetry and redundancy. Cruickshank (1999a,b) has provided a formula for a data precision indicator (DPI) which includes the effect of falling ratio.
For other applications it may be advisable to accept even very weak data. Direct methods use only a subset of the most meaningful reflections but these should extend to as high a resolution as possible. In addition, when the data are sparse from crystals that only diffract to very limited resolution, perhaps around 3 Å, then it is essential to retain all the experimental data, even if they are weak.
References
Cruickshank, D. W. J. (1999a). Remarks about protein structure precision. Acta Cryst. D55, 583–601.Google ScholarCruickshank, D. W. J. (1999b). Remarks about protein structure precision. Erratum. Acta Cryst. D55, 1108.Google Scholar
Diederichs, K. & Karplus, P. A. (1997). Improved R-factor for diffraction data analysis in macromolecular crystallography. Nature Struct. Biol. 4, 269–275.Google Scholar
Read, R. J. (1999). Detecting outliers in non-redundant diffraction data. Acta Cryst. D55, 1759–1764.Google Scholar
Sheldrick, G. M. (1990). Phase annealing in SHELX-90: direct methods for larger structures. Acta Cryst. A46, 467–473.Google Scholar
Weiss, M. S. & Hilgenfeld, R. (1997). On the use of the merging R factor as a quality indicator for X-ray data. J. Appl. Cryst. 30, 203–205.Google Scholar