International
Tables for Crystallography Volume F Crystallography of biological macromolecules Edited by M. G. Rossmann and E. Arnold © International Union of Crystallography 2006 |
International Tables for Crystallography (2006). Vol. F. ch. 11.5, pp. 236-245
https://doi.org/10.1107/97809553602060000678 Chapter 11.5. The use of partially recorded reflections for post refinement, scaling and averaging X-ray diffraction data
a
Department of Biological Sciences, Purdue University, West Lafayette, IN 47907-1392, USA Previous methods used for placing diffraction data recorded on a set of image frames onto a common scale have depended on finding scale factors that minimize the difference between scaled, fully recorded reflections. However, frozen crystals usually have mosaic spreads comparable to the oscillation angle, resulting in only very few, if any, fully recorded reflections on any one frame. Two methods are presented for solving this problem. The first depends on summing the components of a reflection on neighbouring frames; the second depends on calculating the degree of partiality (described in the Appendix Keywords: Hamilton, Rollett and Sparks method; anisotropic mosaicity; anomalous scattering; averaging of reflection intensities; mosaicity; partiality model; scaling. |
Recent advances in the use of frozen crystals of biological samples for X-ray diffraction data collection (Rodgers, 1994) often result in data for which most of the observed reflections on each frame are partially observed. This might be avoided by increasing the oscillation ranges, but this would cause many reflections to overlap with their neighbours. Hence, it is necessary to develop scaling procedures that are independent of the exclusive use of fully recorded reflections.
A set of measured Bragg intensities is dependent on the properties of the crystal, radiation source and detector. Usually, these factors cannot be kept constant throughout the data collection. The crystal may decay, weakening the Bragg intensities, or even `die', which requires the use of several crystals for a full data set. The intensity and position of the primary X-ray beam may vary, especially at synchrotron-radiation sources. Finally, the detector response may change when, for example, different films or imaging plates are used during the data collection.
Most data sets can be divided into series of subsets, or frames, collected under more-or-less constant conditions. These frames need to be placed on a common arbitrary scale. The scaling can be performed by comparing the intensities of multiply measured reflections or symmetry-equivalent reflections on different frames.
A least-squares procedure frequently used for scaling frames of data is the Hamilton, Rollett and Sparks (HRS) method (Hamilton et al., 1965). The target for the HRS least-squares minimization is
where
is the best estimate of the intensity of a reflection with reduced Miller indices h,
is the intensity of the ith measurement of reflection h,
is a weight for reflection
and
is the inverse linear scale factor for frame m on which reflection
is recorded. The reduced Miller indices are those corresponding to an arbitrarily defined asymmetric unit of reciprocal space. The HRS expression (11.5.1.1
) assumes that all reflections
are full, that is, their reciprocal-lattice points have completely passed through the Ewald sphere.
For all unique reflections h, the values of must correspond to a minimum in ψ. Thus,
Therefore, the best least-squares estimate of the intensity of a reflection is
Since ψ is not linear with respect to the scale factors
, the values of the scale factors have to be determined by an iterative nonlinear least-squares procedure. As the scale factors are relative to each other, the HRS procedure requires that one of them be fixed.
Fox & Holmes (1966) describe an improved method of solving the HRS normal equations. Their approach is based on the singular value decomposition of the normal equations matrix. The advantage of the Fox and Holmes method, apart from the accelerated convergence of the least-squares procedure, is that no ad hoc decision needs to be made as to which scale factor should be fixed. Furthermore, `troublesome' frames of data can be identified as causing negligibly small eigenvalues in the normal equations matrix.
11.5.2. Generalization of the Hamilton, Rollett and Sparks equations to take into account partial reflections
When a Bragg reflection is completely exposed within the oscillation range of one frame, a so-called `full reflection', it gives rise to the `full intensity'. In general, a Bragg reflection will occur on a number of consecutive frames as a series of partial reflections, and the full intensity can only be estimated from the measured intensities of the partial reflections. Let represent the intensity contribution of reflection
recorded on frame m; if all the parts of
are available in the data set, then
In practice, there will always be reflections that do not have all their parts available. In such cases, the only way to estimate the full intensity of a reflection is to apply an estimated value of the partiality to the measured reflection intensities
.
Various models have been proposed to calculate the reflection partiality. Here we use Rossmann's model (Rossmann, 1979; Rossmann et al., 1979
) with Greenhough & Helliwell's (1982
) correction. This model treats partiality as a fraction of a spherical volume swept through the Ewald sphere. The coordinates of the reciprocal-lattice point are defined by the Miller indices of the reflection, the crystal orientation matrix and the rotation angle. The volume of the sphere around the reciprocal-lattice point accounts for crystal mosaicity and beam divergence. Alternative geometrical descriptions of a reciprocal-lattice point passing through the Ewald sphere have been given by Winkler et al. (1979
) and Bolotovsky & Coppens (1997
).
Provided the reflection partiality, , is known, the full intensity is estimated by
This expression can produce as many estimates of
as there are parts of reflection
, while expression (11.5.2.1
) produces only one estimate of
when all parts of reflection
are recorded. Having defined the relationships between measured intensities of partial reflections and estimated full reflections by expressions (11.5.2.1
) and (11.5.2.2
), two methods of generalizing the HRS equations can be considered.
The scale factor can be generalized to incorporate crystal decay (Gewirth, 1996
; Otwinowski & Minor, 1997
):
where
is a parameter describing the crystal disorder while frame m was recorded,
is the Bragg angle of reflection
and λ is the X-ray wavelength.
Method 1 only allows the refinement of the scale factors while method 2
allows refinement of the scale factors, crystal mosaicity and orientation matrix, as the latter two factors contribute to the calculated partiality. Furthermore, method 2
is essential for scaling of data sets with low redundancy (e.g. data collected from low-symmetry crystals or data collected over small rotation ranges). When a reflection
spans more than one frame, but there are no other reflections with the same reduced Miller indices h in the data set, the contribution of any partial reflection
to expression (11.5.2.3
) will be zero, as in this case
will be the same as
. In contrast, in method 2
the reflection
can be used for scaling because the estimates of the full intensity
are calculated independently from every frame spanned by reflection
.
Both scaling methods 1 and 2
may take into account any reflection intensity observation, regardless of whether it is a partially or fully recorded reflection. However, there are significant differences between the selection of reflections in the two methods. Method 1
requires that all parts of a reflection are available in order to incorporate the reflection into the generalized HRS target, expression (11.5.2.3
). Thus, reflections that occur at the beginning or the end of the crystal orientation, or at gaps within the rotation range, must be rejected. Even when all parts of a reflection are recorded, there might be parts for which there was a problem during integration, thus making the reflection useless for scaling. The decision on whether all parts of a reflection are available for scaling is dependent on knowledge of the crystal mosaicity and of the crystal orientation matrix. Since these might be inaccurate, a reasonable tolerance has to be exercised when deciding if a reflection has been completely measured on consecutive frames. Method 2
allows the use of all reflections for scaling as every observation of a partial reflection is sufficient to estimate the intensity of a full reflection, expression (11.5.2.2
). However, a reasonable lower limit of calculated partiality has to be imposed in selecting reflections useful for scaling. The criteria for rejecting reflections prior to scaling and averaging are listed in Table 11.5.3.1
.
|
Scale factors will depend on the variation of the incident X-ray beam intensity, crystal absorption and radiation damage. Hence, in general, scale factors can be constrained to follow an analytical function or restrained to minimize variation between successive frames. The scale factors can be restrained by adding a term to ψ, expression (11.5.1.1
), where
and
are scale factors for the nth and (
)th frame and w is a suitably chosen weight. Such procedures will increase
but will also increase the accuracy of the scaled intensities as additional reasonable physical conditions have been applied.
The mis-setting angles of a single crystal should remain constant throughout the data set. Thus, in principle, the mis-setting angles should be constrained to be the same for all frames associated with a single crystal in the data set. However, in practice, independent refinement of the mis-setting angles can detect problems in the data set when there are discontinuities in these angles with respect to frame number. Cell dimensions should be the same for all crystals and might therefore be constrained. However, care should be taken, as the exact conditions of freezing may cause some variations in cell dimensions between crystals. As radiation damage proceeds, mosaicity is likely to increase. Hence, constraint between the refined mosaicities of neighbouring frames can be useful.
Once the scale factors of all frames are determined, they need to be applied to the reflection intensities and error estimates. The reflection intensities with the same reduced Miller indices can then be averaged.
When method 2 is used for averaging, the determination of
is more complicated as there are as many estimates of the full intensity
as there are partial reflections
. Therefore, intensity averaging of reflection h has to be done in two steps. First, for every reflection
, the intensity estimates from all partial observations will be the weighted mean, where the weights are based on the estimated standard deviations of each intensity measurement. In the second step, the average is taken over the i different scaled intensities for the observed reflections.
The selection of reflections useful for averaging is the same as for scaling (Table 11.5.3.1), except that it is no longer necessary to reject reflections that have insignificant intensities. Applying a σ cutoff while averaging the scaled intensities will lead to a statistical bias of the weaker reflection intensities.
For samples of three or more equivalent reflections, it is necessary to consider the absolute values of the differences between individual intensities and the median of the sample: . The outliers can be detected by several statistical tests and, once detected, can be either down-weighted or rejected. When the sample consists of only two reflections, they can be considered a `discordant pair' if the difference between their intensities is not warranted by the estimated errors and, hence, both reflections can be rejected (Blessing, 1997
).
Averaging intensities estimated according to method 2 has an advantage over method 1
as outliers and discordant pairs can be `screened' at two levels: firstly, when the estimates of the full reflection intensity
, calculated by expression (11.5.2.2
) from different parts of the same reflection, are considered, and secondly when the mean intensities
from different reflections are considered.
A commonly used estimate of the quality of scaled and averaged Bragg reflection intensities is . Useful definitions of R factors are:
The linear (R1), square (R2) and weighted (
) R factors can be subdivided into resolution ranges, intensity ranges, reflection classes, frame number and regions of the detector surface. When method 1
is used, reflections
can be grouped in terms of the sums of partialities of contributing partial reflections
.
The R-factor variation depends on the properties of the detector with respect to intensities. Generally the R factor decreases as intensity increases. Thus, the R factor generally increases with resolution. Any deviation from this behaviour might indicate a problem in the data collection due to nonlinearity of the detector response, ice diffuse diffraction, or any other stray effects superimposed on the crystal diffraction.
A useful indicator of the quality of the intensity estimates of partial reflections is the mean ratio of calculated partiality to observed partiality: The deviation of this ratio from unity can be examined as a function of the reflection intensity, resolution and calculated partiality.
The comparison of R factors for centric and noncentric reflections can be used to determine the significance of an anomalous-scattering effect. The quality of the anomalous-dispersion signal can be assessed by calculation of the scatter, , where
and
is the average of the n measurements of the full reflection intensities
. The
values for noncentric reflections can be compared to the scatter,
or
, of reflections differing only in absorption while excluding Bijvoet opposites. The mean scatter is calculated from all
values,
The ratios
and
should be larger than unity for significant anomalous-dispersion data.
If scale factors are to make physical sense, their behaviour with respect to the frame number has to be in accordance with the known changes in the beam intensity, crystal condition and detector response.
The scaling of a φX174 procapsid data set (Dokland et al., 1997) was performed using methods 1
and 2
as described here and using SCALEPACK (Otwinowski & Minor, 1997
) (Fig. 11.5.7.1
). Graphs (a) and (b) in Fig. 11.5.7.1
have four segments corresponding to four synchrotron beam `fills'. All three methods give scale factors within 5% of each other (Figs. 11.5.7.1c and d
). However, for the first and last frame of each `fill' the results can differ by as much as 15%. Both method 1
and SCALEPACK produce physically wrong results in that the scale factors of these frames look like outliers compared to the scale factors of the neighbouring frames. By contrast, method 2
provides consistent scale factors for these frames. Although the algorithm used by SCALEPACK for scaling frames with partial reflections has never been disclosed, the similar results obtained by method 1
and SCALEPACK suggest that SCALEPACK might be using an algorithm similar to that of method 1
(Fig. 11.5.7.1d
).
Attempts at scaling a data set of a frozen crystal of HRV14 (Rossmann et al., 1985, 1997
) failed with method 1
as a result of gaps in the rotation range for the first 20 frames, causing singularity of the normal equations matrix. When frames without useful neighbours were excluded, the cubic symmetry of the crystal was sufficient for successful scaling. In contrast, method 2
did not have any problems with the whole data set, and the results obtained with method 2
showed greater consistency than those obtained with method 1
or SCALEPACK (Fig. 11.5.7.2
).
![]() | Linear scale factor as a function of frame number for an HRV14 data set (Rossmann et al., 1985 |
The accuracy and robustness of method 2 is also demonstrated by the scaling results for a Sindbis virus capsid protein (SCP), residues 114–264 (Choi et al., 1991
, 1996
). The behaviour of the scale factor with respect to the frame number reflects the anisotropy of the thin plate-shaped crystal (Fig. 11.5.7.3
). For the first 40 frames (frame numbers 0 to 39), even-numbered frames have higher scale factors than odd-numbered frames. Data collection was stopped after frame number 39 and restarted. After frame number 39, odd-numbered frames have higher scale factors than even-numbered frames. This effect presumably relates to the use of the two alternating image plates with slightly different sensitivities in the R-axis camera used in the data collection.
In order to determine the limits of tolerance that can be permitted when method 1 is used, the R factor was examined as a function of the sum-of-partialities for the φX174 procapsid data (Fig. 11.5.7.4
). Reflections with sum-of-partialities of
were used. The R factor changes sharply when the sum-of-partialities is outside
. Hence,
were acceptable limits of tolerance for this data set.
The behaviour of the R factor versus frame number (Fig. 11.5.7.5) is more monotonic when method 1
is used compared to method 2
. In method 1
, the data-quality estimates for neighbouring frames are strongly correlated because the full reflections used in the statistics are obtained by summing partials from consecutive frames. By contrast, in method 2
every frame produces estimates of full reflection intensities independently of the neighbouring frames. Therefore, the R factors per frame calculated after scaling with method 2
truly represent the data quality for individual frames.
The relationship between observed and calculated partialities (Fig. 11.5.7.6) deviates from the ideal line
, especially for the smaller calculated partialities where
. This suggests errors in the measurements of
or the calculations of
. The latter may be improved by a post refinement of the orientation matrix and crystal mosaicity (Rossmann et al., 1979
).
Refinement of the effective mosaicity can show both the anisotropic nature of the crystal (Fig. 11.5.7.7) as well as the impact of radiation damage. The effective mosaicity is the convolution of the mosaic spread of the crystal, the beam divergence and the wavelength divergence of the incident X-ray beam. Hence, X-ray diffraction data collected at a synchrotron-radiation source necessitate the differentiation of the effective mosaicity in the horizontal and vertical planes. A more general approach is the introduction of six parameters reflecting the anisotropic effective mosaicity.
The quality of anomalous-dispersion data can be assessed by calculation of the average scatter, expression (11.5.6.6). The ratios
and
should be larger than unity for significant anomalous data (Fig. 11.5.7.8
). Note the much larger ratios for the scatter among measurements of
for data measured at the absorption edge of Se, as opposed to measurements remote from the edge. The decreasing values of the ratios with resolution are due to the decrease of
value, thus causing the error in the measurement of
to approach the difference in intensity of Bijvoet opposites.
The generalized HRS method allows scaling and averaging of X-ray diffraction data collected with an oscillation camera while simultaneously using full and partial reflections. The procedure is as useful for thin slices of reciprocal space as it is for thicker slices.
The results of data processing with the two different algorithms indicate that method 1, based on adding partial reflections, may fail to scale data sets with gaps in the rotation range or with low redundancy. The values of the scale factors obtained with both methods are similar, except for cases where there are gaps in the rotation range or dramatic changes in the true scale factors between consecutive frames. In these cases, method 1
produces a physically wrong result. The algorithm used by method 1
is probably similar to that used by SCALEPACK (Otwinowski & Minor, 1997
).
Method 2 is more stable and versatile than method 1
, and allows the scaling of data sets with incompletely measured reflections and low redundancy. The major drawback of method 2
is that errors in the crystal orientation matrix and mosaicity, as well as inadequacies of the theoretical model for reflection partiality, contribute to errors in the scaled intensities. Therefore, post refinement is needed for method 2
to perform at its best.
Appendix A11.5.1
Small differences in the orientation of domains within the crystal, as well as the cross fire of the incident X-ray beam, will give rise to a series of possible Ewald spheres. Their extreme positions will subtend an angle 2m at the origin of the reciprocal space, and their centres lie on a cusp of limiting radius , where m is the half-angle effective mosaic spread. As the reciprocal lattice is rotated around the axis (Oy) perpendicular to the mean direction of the incident radiation (Oz), a point P will gradually penetrate the effective thickness of the reflection sphere (Fig. A11.5.1.1
). Initially, only a few domain blocks will satisfy Bragg's law, but upon further rotation the number of blocks that are in a reflecting condition will increase. The maximum will be reached when the point P has penetrated halfway through the sphere's effective thickness, after which there will be a decline of the crystal volume able to diffract.
Let q be a measure of the fraction of the path travelled by P between the extreme reflecting positions and
, and let p be the fraction of the energy already diffracted. Then the relation between p and q must have the general form shown in Fig. A11.5.1.2
. It is physically reasonable to assume that the curve for p is tangential to
at
and to
at
.
A reasonable approximation to the above conditions can be obtained by considering the fraction of the volume of a sphere removed by a plane a distance q from its surface (Fig. A11.5.1.2). It is easily shown that if p is the volume, then
This curve is shown in Fig. A11.5.1.2
and corresponds to assuming that the reciprocal-lattice point is a sphere of finite volume cutting an infinitely thin Ewald sphere. Also shown in Fig. A11.5.1.2
is the line
which would result if the reciprocal-lattice point were a rectangular block whose surfaces were parallel and perpendicular to the Ewald sphere at the point of penetration.
Assuming a right-handed coordinate system (x, y, z) in reciprocal space fixed to the camera, it is easily shown (Wonacott, 1977) that the condition for reflection is
where
is the distance of a reciprocal-lattice point P(x, y, z) from the origin, O, of reciprocal space. Similarly, it can be shown that at the ends of the path of the reciprocal-lattice point through the finite thickness of the sphere,
Therefore,
Since δ is small, it can be assumed that
is independent of the position of the reciprocal-lattice point P between the extreme positions
and
(Fig. A11.5.1.1
). Hence, the length of the path through the finite thickness of the sphere is proportional to
Now, if a reflection is only just penetrating the sphere at the end of the oscillation range, then the fraction of penetration is given by
Substituting this expression into equation (A11.5.1.4)
, it follows that
where
and
The subscripts A and B refer to the beginning and end of the oscillation range for the partial reflection P, respectively.
Similarly, if a reflection is almost completely within the sphere, There are indeed four such conditions: two while a reflection is entering the Ewald sphere, and two while it is exiting. As such, it is readily seen that
is the range for a partial reflection. The full range of conditions is given in Table A11.5.1.1
, as are the conditions for a full reflection.
|
Acknowledgements
This article is based primarily on the original publication by Bolotovsky et al. (1998). We are grateful for an NSF Grand Challenge grant in support of this work.
References



















