Integration by profile fitting

Leslie, A. G. W.

doi:10.1107/97809553602060000675

International
Tables for
Crystallography
Volume F
Crystallography of biological macromolecules
Edited by M. G. Rossmann and E. Arnold

pdf | chapter contents | chapter index | related articles

International Tables for Crystallography (2006). Vol. F. ch. 11.2, pp. 214-217 | 1 | 2 |

Section 11.2.6. Integration by profile fitting

A. G. W. Leslie^a ^*

^aMRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, England
Correspondence e-mail: andrew@mrc-lmb.cam.ac.uk

11.2.6. Integration by profile fitting

| top | pdf |

Providing the background and peak regions are correctly defined, summation integration provides a method for evaluating integrated intensities that is both robust and free from systematic error. For weak reflections, however, many of the pixels in the peak region will contain very little signal (Bragg intensity) but will contribute significantly to the noise because of the Poissonian variation in the background [as shown by the $[I_{\rm bg}]$ term in equation (11.2.5.11)]. Profile fitting provides a means of improving the signal-to-noise ratio for this class of reflection (but will provide no improvement for reflections where the background level is negligible).

11.2.6.1. Forming the standard profiles

| top | pdf |

In order to apply profile-fitting methods, the first requirement is to derive a `standard' profile that accurately represents the true reflection profile. Although analytical functions can be used, it is difficult to define a simple function that will cope adequately with the wide variation in spot shapes that can arise in practice. Most programs therefore rely on an empirical profile derived by summing many different spots. The optimum profile is that which provides the best fit to all the contributing reflections, i.e. that which minimizes $[R_{2}= {\textstyle\sum\limits_{h}} w_{j} (h) \left[K_{h} P_{j} - \rho_{j} (h)_{\rm corr}\right]^{2}, \eqno(11.2.6.1)]$ where $[P_{j}]$ is the profile value for the $[j{\rm th}]$ pixel, $[\rho_{j}(h)_{\rm corr}]$ is the observed background-corrected count at that pixel for reflection h, $[K_{h}]$ is a scale factor and $[w_{j}(h)]$ is a weight for the $[j{\rm th}]$ pixel of reflection h. The summation extends over all reflections contributing to the profile. The weight is given by $[w_{j} (h) = 1/\sigma_{hj}^{2}, \eqno(11.2.6.2)]$ and from Poisson statistics $[\sigma_{hj}^{2}]$ is the expectation value of the counts at pixel j, and is given by $[\sigma_{hj}^{2} = K_{h}P_{j} + \left(a_{h} p_{j} + b_{h} q_{j} + c_{h} \right). \eqno(11.2.6.3)]$ After Rossmann (1979), the summation integration intensity $[I_{s}(h)]$ can be used to derive a value for $[K_{h}]$ : $[I_{s}(h) = K_{h}{\textstyle\sum\limits_{j = 1}^{m}} P_{j}. \eqno(11.2.6.4)]$ In equations (11.2.6.3) and (11.2.6.4), as the profile values $[P_{j}]$ are not yet determined, a preliminary profile derived, for example, from simple summation of strong reflections used in the detector-parameter refinement can be used, which will give acceptable weights for use in equation (11.2.6.1).

This method of deriving the standard profile is only appropriate for fully recorded reflections. However, in many cases there will be very few or no fully recorded reflections on each image. In such cases the profile is determined by simply adding together the background-corrected pixel counts from all contributing reflections. In the program MOSFLM (Leslie, 1992), the profiles are determined using reflections on, typically, ten or more successive images, so that partials will be summed to give the correct fully recorded profile for the majority of the contributing reflections. Tests carried out using standard profiles derived using only fully recorded reflections and equation (11.2.6.1), or using both fully recorded and partially recorded reflections and simple summation, give data of the same quality as judged by the merging statistics.

The reflection profile changes across the face of the detector, due to obliquity of incidence, changes in the projected diffracting volume and geometric factors. In the MOSFLM program, this variation is accommodated by determining several standard profiles (typically nine or 25) for different regions of the detector. When evaluating the profile-fitted intensity for a given reflection, a weighted sum of the nearest standard profiles is calculated to provide the best estimate of the true profile at that position on the detector. For the central regions of the detector there will be four contributing profiles, while at the edges there will be between one and three. The weights assigned to each profile vary linearly with the distance from the reflection to the centres of the regions used in determining the standard profiles. An alternative procedure used in DENZO (Otwinowski & Minor, 1997) is to evaluate a new profile for each reflection based on spots lying within a pre-specified radius.

11.2.6.2. Evaluation of the profile-fitted intensity

| top | pdf |

Given an appropriate standard profile, the reflection intensity for fully recorded reflections is evaluated by determining the scale factor K and background plane constants a, b, c which minimize $[R_{3} = {\textstyle\sum} \ w_{i} \left(KP_{i} + ap_{i} + bq_{i} + c - \rho_{i} \right)^{2}, \eqno(11.2.6.5)]$ where the summation is over all valid pixels in the measurement box. As before, $[w_{i} = 1 / \sigma_{i}^{2} \eqno(11.2.6.6)]$ and $[\eqalignno{\sigma_{i}^{2} &= \hbox{expectation value of the counts at pixel } i\cr &= ap_{i} + bq_{i} + c + JP_{i}. &(11.2.6.7)}]$ In order to calculate the weights, the background plane constants and summation integration intensity $[I_{s}]$ are evaluated as described in Section 11.2.5, at the same time identifying any outliers in the background. The summation integration intensity is used to evaluate the scale factor J in equation (11.2.6.7) using $[I_{s} = J {\textstyle\sum\limits_{i}} P_{i}. \eqno(11.2.6.8)]$ In equation (11.2.6.5), the summation is over all valid pixels within the measurement box. This excludes pixels that are overlapped by neighbouring spots (if any) and any outliers identified in the background region.

Minimizing $[R_{3}]$ with respect to K, a, b and c leads to four linear equations from which K, a, b and c can be determined: $[{\left(\matrix{{\textstyle\sum}wP^{2} &{\textstyle\sum}wpP &{\textstyle\sum}wqP &{\textstyle\sum}wP\hfill\cr {\textstyle\sum}wpP &{\textstyle\sum}wp^{2} &{\textstyle\sum}wpq &{\textstyle\sum}wp\hfill\cr {\textstyle\sum}wqP &{\textstyle\sum}wpq &{\textstyle\sum}wq^{2} &{\textstyle\sum}wq\hfill\cr {\textstyle\sum}wP\hfill &{\textstyle\sum}wp\hfill &{\textstyle\sum}wq\hfill&{\textstyle\sum}w\hfill \cr}\right)\left(\matrix{K\hfill\cr a\hfill\cr b\hfill\cr c\hfill\cr}\right) = \left(\matrix{{\textstyle\sum}wP\rho\hfill\cr {\textstyle\sum}wp\rho\hfill\cr {\textstyle\sum}wq\rho\hfill\cr {\textstyle\sum}w\rho\hfill\cr}\right).}\eqno(11.2.6.9)]$ The profile-fitted intensity $[I_{p}]$ is then given by $[I_{p} = K {\textstyle\sum\limits_{i}} P_{i}. \eqno(11.2.6.10)]$ The standard deviation in the profile-fitted intensity is given by $[\eqalignno{\sigma_{I_{p}}^{2} &= \sigma_{K}^{2} \left({\textstyle\sum\limits_{i}} P_{i}\right)^{2} &(11.2.6.11)\cr &= \left({\textstyle\sum\limits_{i}^{N}} w_{i} \Delta_{i}^{2} \big/ (N - 4)\right)A_{KK}^{-1} \left({\textstyle\sum\limits_{i}} P_{i}\right)^{2}, &(11.2.6.12)}]$ where $[\Delta_{i} = (KP_{i} + ap_{i} + bq_{i} + c - \rho_{i}), \eqno(11.2.6.13)]$ N is the number of pixels in the summation and $[A_{KK}^{-1}]$ is the diagonal element for the scale factor K of the inverse normal matrix (used to minimize $[R_{3}]$ ).

In the case of partially recorded reflections, it is no longer valid to fit the sum of the scaled standard profile and a background plane to all pixels in the measurement box. Partially recorded reflections can have a profile that differs significantly from the standard profile, with the result that the background plane constants take on physically unreasonable values in an attempt to compensate for this difference. Therefore, for partially recorded reflections, the summation in equation (11.2.6.5) is restricted to pixels in the peak region of the measurement box. Minimizing $[R_{3}]$ with respect to the scale factor K then gives $[\openup 6pt\eqalignno{I_{p} &= K {\textstyle\sum\limits} P_{i} &(11.2.6.14)\cr &=({\textstyle\sum} w_{i}P_{i}\rho_{i} - a {\textstyle\sum} w_{i}P_{i}p_{i} - b{\textstyle\sum} w_{i}P_{i}q_{i} - c {\textstyle\sum} w_{i}P_{i}) {\textstyle\sum} P_{i} \big/ {\textstyle\sum} w_{i} P_{i}^{2}, &\cr &&(11.2.6.15)}]$ where all summations are over the peak region only.

It is not possible to derive a standard deviation for partially recorded reflections based on the fit of the scaled standard profile (because partially recorded reflections have a different spot profile). For these reflections, the standard deviation can be calculated using equation (11.2.5.17).

11.2.6.3. Modifications for very close spots

| top | pdf |

In order to apply equation (11.2.6.5), it is necessary to exclude all pixels in the measurement box that are overlapped by a neighbouring spot. This applies not only to the pixels of the reflection being integrated, but also to the pixels of all the reflections used to form the standard profile. Consequently, a pixel should be excluded even if it is only overlapped by a neighbouring spot for one of the reflections used in forming the standard profile. When processing data from large unit cells, this can lead to a very high percentage of the background pixels being rejected and therefore a poor determination of the background plane parameters. In these circumstances, the background plane is determined using only background pixels and excluding only those pixels that are overlapped by neighbours for the reflection actually being integrated. The profile-fitted intensity for both fully recorded and partially recorded reflections is then evaluated in the way described for partially recorded reflections in Section 11.2.6.2, with the summation in equation (11.2.6.15) extending only over peak pixels. The standard deviation in the intensity for partially recorded reflections is derived from equation (11.2.5.17) as before. For fully recorded reflections, the standard deviation has two components: the first is based on the fit of the scaled standard profile to the reflection profile and the second on the contribution from the background: $[\eqalignno{\sigma_{I}^{2} &= \sigma_{\rm prof}^{2} + \sigma _{\rm bg}^{2} &(11.2.6.16)\cr &= \left[{\textstyle\sum\limits_{i = 1}^{m}} w_{i} \Delta_{i}^{2} \Big/ (m - 1)\right] \left[\left({\textstyle\sum\limits_{i = 1}^{m}} P_{i}\right)^{2} \bigg/ {\textstyle\sum\limits_{i = 1}^{m}} w_{i} P_{i}^{2}\right]\cr &\quad + (m/n) {\textstyle\sum\limits_{i = 1}^{n}} \left(\rho_{i} - ap_{i} - bq_{i} - c\right)^{2}, &(11.2.6.17)}]$ where m and n are the number of pixels in the peak and background, respectively.

11.2.6.4. Profile fitting very strong reflections

| top | pdf |

For very strong reflections, the background level is very small and equation (11.2.6.15) reduces to $[I_{p} \simeq {\textstyle\sum} w_{i} P_{i} \rho_{i} {\textstyle\sum} P_{i} \big/ {\textstyle\sum} w_{i} P_{i}^{2}, \eqno(11.2.6.18)]$ and the weights are given by $[w_{i} \simeq 1 \big/ JP_{i}. \eqno(11.2.6.19)]$ Substituting for $[w_{i}]$ in (11.2.6.18) gives $[I_{p} \simeq {\textstyle\sum} \rho_{i}. \eqno(11.2.6.20)]$ As pointed out by Z. Otwinowski (personal communication), this shows that for correctly weighted profile fitting, the profile-fitted intensity reduces to the summation integration intensity for very strong intensities.

11.2.6.5. Profile fitting very weak reflections

| top | pdf |

For very weak reflections, all pixels will have very similar counts and therefore all the weights will be the same. For simplicity, consider the case where the profile fit is evaluated only for the peak pixels, then equation (11.2.6.15) reduces to $[I_{P} \simeq {\textstyle\sum} P_{i} \left(\rho_{i} - ap_{i} - bq_{i} - c\right) {\textstyle\sum} P_{i} \big/ {\textstyle\sum} P_{i}^{2}. \eqno(11.2.6.21)]$ The second and third summations in this equation depend only on the shape of the standard profile. This shows that the intensity is a weighted sum of the individual background-corrected pixel counts (rather than a simple unweighted sum, as is the case for summation integration). Because the values of $[P_{i}]$ are a maximum in the centre of the spot, this will place a higher weight on those pixels where the contribution of the Bragg diffraction is greatest, and a very low weight on the peripheral pixels where the Bragg diffraction is weakest. In this way, profile fitting improves the signal-to-noise ratio without the risk of introducing any systematic error that may result by simply reducing the size of the peak region for weak spots.

11.2.6.6. Improvement provided by profile fitting weak reflections

| top | pdf |

For very weak reflections, where all the weights $[w_{i}]$ are approximately the same, the variance in $[I_{p}]$ using equation (11.2.6.21) is given by $[\sigma_{I_{p}}^{2} = {\textstyle\sum} \hbox{Var}\left(\rho_{i} - ap_{i} - bq_{i} - c\right) P_{i}^{2} \left({\textstyle\sum} P_{i} \big/ {\textstyle\sum} P_{i}^{2}\right)^{2}. \eqno(11.2.6.22)]$ Assuming a flat background and very weak intensity, then from Poisson statistics $[\hbox{Var}\left(\rho_{i} - ap_{i} - bq_{i} - c\right) \simeq G\rho_{i}, \eqno(11.2.6.23)]$ and as $[\rho_{i}]$ has approximately the same value $[(\rho)]$ for all pixels, $[\eqalignno{\sigma_{I_{p}}^{2} &= G\rho {\textstyle\sum} P_{i}^{2} \left({\textstyle\sum} P_{i} \big/ {\textstyle\sum} P_{i}^{2}\right)^{2} &(11.2.6.24)\cr &= G\rho \left({\textstyle\sum} P_{i}\right)^{2} \big/ {\textstyle\sum} P_{i}^{2}. &(11.2.6.25)\cr}]$ The variance in the summation integration intensity is simply $[\sigma_{I_{s}}^{2} = Gm\rho. \eqno(11.2.6.26)]$ The ratio of the variances is thus $[\sigma_{I_{s}}^{2} \big/ \sigma_{I_{p}}^{2} = m{\textstyle\sum} P_{i}^{2} \big/ \left({\textstyle\sum} P_{i}\right)^{2}. \eqno(11.2.6.27)]$ For a typical spot profile, the right-hand side (which depends only on the shape of the standard profile) has a value of 2, showing that profile fitting can reduce the standard deviation in the integrated intensity by a factor of $[(2)^{1/2}]$ .

11.2.6.7. Other benefits of profile fitting

| top | pdf |

11.2.6.7.1. Incompletely resolved spots

| top | pdf |

If adjacent spots are not fully resolved, there will be a systematic error in the integrated intensity which will be largest for weak spots that are adjacent to very strong spots. However, the profile-fitted intensity will be affected less than the summation integration intensity, because the peripheral pixels (where the influence of neighbouring spots is greatest) are down-weighted relative to the central pixels (where the neighbours will have least influence).

Further steps can be taken to minimize the errors caused by overlapping spots. Firstly, when forming the standard profiles, reflections are only included if they are significantly stronger than their nearest neighbours. This will minimize the errors in the standard profiles. Secondly, when evaluating the profile-fitted intensity of a particular reflection, pixels can be omitted if they are adjacent to a pixel that is part of a neighbouring spot (rather than having to be part of that spot).

11.2.6.7.2. Elimination of peak pixel outliers

| top | pdf |

In the same way that outliers in the background region can be identified and rejected (see Section 11.2.5.1.1), it is possible in principle to identify outliers in the peak region of fully recorded reflections as those pixels whose deviation from the scaled standard profile is significantly greater than that expected from counting statistics. This approach works well if the feature that gives rise to the outliers affects only a small fraction of the peak pixels and gives rise to large deviations, and this is the case for some zingers or dead pixels, and for diffraction from small ice crystals when collecting data from cryo-cooled samples.

Another source of outliers is the encroachment of a strong neighbouring spot into the peak region, as discussed in Section 11.2.6.7.1. When dealing with peripheral pixels, the outlier test can be applied to both fully recorded and partially recorded reflections, but a high σ cutoff (e.g. 10–20) must be used to avoid rejecting pixels that do not fit the profile simply because they correspond to a partially recorded spot.

11.2.6.7.3. Estimation of overloaded reflections

| top | pdf |

Owing to the limited dynamic range of current detectors, it is common for many low-resolution spots to contain saturated pixels. Providing the saturation level of the detector is known, such pixels can simply be excluded from the profile fitting, allowing a reasonable estimate of the true intensity (except when the majority of the pixels are saturated). A knowledge of the strong intensities is essential for structure solution based on molecular replacement techniques, and so this is a very useful additional feature of profile fitting.

11.2.6.8. Profile fitting partially recorded reflections

| top | pdf |

Greenhough & Suddath (1986) have shown that when profile fitting is applied to partially recorded reflections this leads to a systematic error in the individual intensities, but there is no systematic error in the total summed intensity. Although their analysis is strictly only applicable to the case of unweighted profile fitting, experience has shown that even when using weighted profile fitting there is no evidence of systematic errors in the summed profile-fitted intensities of partially recorded reflections. This is particularly important as many data sets collected from frozen crystals have few, if any, fully recorded reflections.

11.2.6.9. Systematic errors in profile-fitted intensities

| top | pdf |

The fundamental assumption in profile fitting is that the standard profiles accurately reflect the true profile of the reflection being integrated. Errors in the standard profile will result in systematic errors in the profile-fitted intensities. While these errors will often be small compared to the random (Poissonian) error for weak reflections, this is not necessarily the case for strong reflections, as the systematic error is typically a small percentage of the total intensity. Because the standard profiles are derived from the summation of many contributing reflections, small positional errors in spot prediction will lead to a broadening of the standard profile relative to the profile of an individual spot. The same broadening can occur because of the finite sampling interval in the image, which means that a predicted spot position can lie up to half a pixel away from the centre of the measurement box. This error can be minimized by interpolating the pixel values in the image onto a grid which is centred exactly on the predicted position, but the interpolation step itself will inevitably distort the reflection profile. In spite of these difficulties, providing adequate care is taken to determine the crystal and detector parameters accurately (as mentioned in Section 11.2.2), so that the spot positions are predicted to within a small fraction of the overall spot width, there is no suggestion (from merging statistics at least) for significant systematic error, even in the stronger intensities.

References

Greenhough, T. J. & Suddath, F. L. (1986). Oscillation camera data processing. 4. Results and recommendations for the processing of synchrotron radiation data in macromolecular crystallography. J. Appl. Cryst. 19, 400–409.Google Scholar

Leslie, A. G. W. (1992). Recent changes to the MOSFLM package for processing film and image plate data. CCP4 and ESF-EACMB Newsletter on Protein Crystallography. Warrington: Daresbury Laboratory.Google Scholar

Otwinowski, Z. & Minor, W. (1997). Processing of X-ray diffraction data collected in oscillation mode. Methods Enzymol. 276, 307–326.Google Scholar

Rossmann, M. G. (1979). Processing oscillation diffraction data for very large unit cells with an automatic convolution technique and profile fitting. J. Appl. Cryst. 12, 225–238.Google Scholar

International Tables for Crystallography (2006). Vol. F. ch. 11.2, pp. 214-217