International
Tables for Crystallography Volume F Crystallography of biological macromolecules Edited by M. G. Rossmann and E. Arnold © International Union of Crystallography 2006 |
International Tables for Crystallography (2006). Vol. F. ch. 21.2, pp. 510-517
Section 21.2.3.1. A systematic approach using the SFCHECK software
aUnité de Conformation de Macromolécules Biologiques, Université Libre de Bruxelles, avenue F. D. Roosevelt 50, CP160/16, B-1050 Bruxelles, Belgium, and EMBL–EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, England, bUnité de Conformation de Macromolécules Biologiques, Université Libre de Bruxelles, avenue F. D. Roosevelt 50, CP160/16, B-1050 Bruxelles, Belgium, and cDepartment of Chemistry, Rutgers University, 610 Taylor Road, Piscataway, NJ 08854-8087, USA |
SFCHECK reads in the structure-factor data written in mmCIF format. It then performs the following operations: Reflections are excluded if they are systematically absent, negative, or have flagged σ values (99.9). Equivalent reflections are merged. The amplitudes of missing reflections are approximated by taking the average value for the corresponding resolution shell.
From the model coordinates read from the PDB (or mmCIF) atomic coordinates file, SFCHECK calculates structure factors and scales them to the observed structure factors. The scaling factor, S, is computed using a smooth cutoff for low-resolution data (Vaguine et al., 1999) (Table 21.2.3.1). This involves the calculation of the observed and calculated overall B factors from the standard deviations of the Gaussian fitted to the Patterson origin peaks [see Table 21.2.3.1 and Vaguine et al. (1999)]. In addition, SFCHECK also estimates the overall anisotropy of the data, following the approach of Sheriff & Hendrickson (1987), and applies the anisotropic scaling after the Patterson scaling is performed (Murshudov et al., 1998).
† is the standard deviation of the Gaussian fitted to the Patterson origin peak.
‡F is the structure-factor amplitude, and is the structure-factor standard deviation. The brackets denote averages. § is the standard deviation of the spherical interference function, which is the Fourier transform of a sphere of radius , with being the minimum d spacing. ¶ is added to the calculated overall B factor, , so as to make the width of the calculated Patterson origin peak equal to the observed one; s is the magnitude of reciprocal-lattice vector. ††, where s and , respectively, are the magnitude of the reciprocal-lattice vector and the maximum d spacing. |
To assess the quality of the structure-factor data, the program computes four additional quantities (see Table 21.2.3.1 for details): the completeness of the data, the uncertainty of the structure-factor amplitudes, the optical resolution and the expected optical resolution. The latter two quantities represent the expected minimum distance between two resolved atomic peaks in the electron-density map when the latter is computed with the set of reflections specified by the authors and with all the reflections, respectively.
To evaluate the global agreement between the atomic model and the experimental data, the program computes three classical quality indicators: the R factor, (Brünger, 1992b) and the correlation coefficient between the calculated and observed structure-factor amplitudes (Table 21.2.3.1). The R factor is computed using all the reflections considered (except those approximated by their average value in the corresponding resolution shell) and applying the same resolution and σ cutoff as those reported by the authors. is computed using the subset of reflections specified by the authors. In addition, the R factor is evaluated using the `non-free' subset of reflections (those not used to compute ). The correlation coefficient is computed using all reflections from the reported high-resolution limit, applying the smooth low-resolution cutoff (see Table 21.2.3.1) but no σ cutoff.
The errors associated with the atomic positions are expressed as standard deviations (σ) of these positions. SFCHECK computes three different error measures. One is the original error measure of Cruickshank (1949). The second is a modified version of this error measure, in which the difference between the observed and calculated structure factors is replaced by the error in the experimental structure factors. The first two error measures are the expected maximal and minimal errors, respectively, and the third measure is the diffraction-component precision indicator (DPI). The mathematical expressions for these error measures are given in Table 21.2.3.2, and further details can be found in Vaguine et al. (1999).
†σ(slope) and curvature are the slope and curvature of the electron-density map at the atomic centre, in the x direction, for spherically symmetric peaks; .
‡a is the crystal unit-cell length, h is the Miller index and V unit cell the unit-cell volume. § is the standard deviation of the structure-factor amplitude. ¶c is the structure-factor data completeness expressed as a fraction (0–1), R is the conventional R factor, is the total number of atoms in the unit cell, is the total number of observed reflections and is the minimum d spacing. |
In addition to the global structure quality measures, SFCHECK also determines the quality of the model in specific regions. Several quality estimators can be calculated for each residue in the macromolecule and, whenever appropriate, for solvent molecules and groups of atoms in ligand molecules. These estimators are the normalized atomic displacement (Shift), the correlation coefficient between the calculated and observed electron densities (Density correlation), the local electron-density level (Density index), the average B factor (B-factor) and the connectivity index (Connect), which measures the local electron-density level along the molecular backbone. These quantities are computed for individual atoms and averaged over those composing each residue or group of atoms [see Table 21.2.3.3 and Vaguine et al. (1999) for details].
†Gradient i is the gradient of the map with respect to the atomic coordinates, curvature i is the curvature of the model map computed at the atomic centre (see Agarwal, 1978), N is the number of atoms in the group considered and σ is the standard deviation of the values computed in the structure.
‡ and are, respectively, the electron density computed from calculated and observed structure-factor amplitudes at the atomic centre. The summation is performed over all the atoms in the group considered. For polymer residues, D_corr is computed separately for backbone and side-chain atoms. For the calculation of the electron density at the atomic centre, see Vaguine et al. (1999). § is the geometric mean of the electron density of the atom subset considered and is the average electron density of the atoms in the structure. For water molecules or ions which are represented by a unique atom, the above expression reduces to the ratio . ¶Backbone atoms are N, C, Cα for proteins and P, O5′, C5′, C3′, O3′ for nucleic acids. |
Figs. 21.2.3.1–21.2.3.3 summarize the analysis carried out by SFCHECK on the protein rusticyanin from Thiobacillus ferrooxidans (1RCY) (Walter et al., 1996). Fig. 21.2.3.1 displays the numerical results from the analysis of the structure-factor data and from the evaluation of the global agreement between the model and the data. The R-factor and values, computed by SFCHECK (Model vs. Structure Factors panel) using the identical reflection subset to that reported by the authors (Refinement panel), show negligible differences with the reported values. These differences are 0.175 versus 0.172 for the R factor and 0.25 versus 0.243 for . The small R-factor difference may stem from the fact that SFCHECK considers a somewhat different number of reflections (9144) than the authors (9098), although it uses the same d-spacing range and σ cutoff as those reported.
The information in Figs. 21.2.3.1 and 21.2.3.2 allows one to make some judgement about the quality of the structure-factor data for this protein. The relatively high resolution of this structure (1.9 Å) is accompanied by limited data completeness (82.1%). The Rstand(F) plot on the same graph shows, furthermore, a decrease in quality of the high-resolution data (2.2–1.9 Å). The average radial completeness plot (bottom left-hand plot of Fig. 21.2.3.2) allows one to identify the regions in reciprocal space with incomplete data.
Fig. 21.2.3.3 presents the SFCHECK analysis of the local agreement of the model with the electron density for 1RCY. The shift plot shows that both backbone and side-chain shifts are of comparable size, with several residues (1, 2, 16, 25) displaying shifts as high as 0.16 Å. The density correlation is excellent throughout the entire molecule, except for residues 2, 16 and 29, which display poorer correlation. In particular, the side chains of these residues seem to be more poorly defined in the electron-density map. The backbone density index plunges in a few regions, notably at the N-terminus (residues 5–7) and in the segments comprising residues 25–30 and 68–70. The side chains display, in general, a poorer density index than the backbone, with some regions (for example, residues 5–7, 23–30, 58–60) displaying rather low density indices. The same segments also display higher backbone and side-chain B factors. The backbone Connect parameter is, on the other hand, quite good throughout, except for residues 5–7 and 28–29 (Fig. 21.2.3.3).
Water molecules (labeled w in the SFCHECK output) are also evaluated. The relevant plots for these molecules are those of the Shift, Density index and B factor parameters. The first 50 or so water molecules in the list (appearing sequentially along the plot from left to right) tend to display a higher density index and lower B factors (< 30 Å2) than the following molecules in the list. They thus seem to be more reliably positioned than subsequent molecules, whose density indices sometimes drop perilously. A steady climb of the B factors is also apparent as one goes down the list of water molecules. The analysis of the density indices and B factors of individual water molecules performed by SFCHECK could be a very useful guide in investigations of the properties of crystallographic water molecules and their interactions with protein atoms.
As for the evaluation of the geometric and stereochemical parameters of the model, surveying the same quality indicators across many structures is crucial. It allows one to establish the ranges of expected values for each indicator and to identify structures with unexpected features – those for which the values of one or more quality indicators are outside their standard range.
The global quality indicators computed by SFCHECK are the nominal resolution (d spacing), the R factor, , the minimal and maximal errors in atomic positions, the DPI, and the correlation coefficient . Another type of global quality indicator can be obtained by computing the average values of local quality measures across a given structure. This can be done for the per-residue (or per-group) atomic displacement and the Density correlation and B factor parameters as well as for the Density index and Connect parameters.
Many of the geometric and stererochemical quality indicators vary as a function of resolution – some linearly and some not (Laskowski et al., 1993). This is also the case for most of the global quality indicators described here. Examples of this dependence are given in Fig. 21.2.3.4, which shows how the correlation coefficient, the maximal error, the average atomic displacement and average density index vary as a function of resolution in the 104 nucleic acid structures surveyed. This variation is approximately linear for all four parameters. The density correlation and average density index decrease, whereas the maximal error and average atomic displacements increase, as the resolution gets poorer. In all four plots of Fig. 21.2.3.4, the points tend to display significant scatter as the d spacing increases, and at least three points, corresponding to the same three structures, appear as outliers in all plots. These structures also appear as outliers in the analysis of other parameters. A closer examination revealed that in the vast majority of the cases, the abnormal behaviour of these structures could be traced back to problems with data formats or errors that occurred during data deposition and entry processing.
As the number of structures with deposited structure-factor data becomes large enough, plots such as those of Fig. 21.2.3.4 could be used to define the expected range of values for a quality indicator in a structure determined at a given resolution or refined under given conditions. Structures yielding quality indicators outside this range could then be identified as unusual on a more solid statistical basis.
The main purpose for computing the four local quality measures, the B factor, the Density index, the atomic displacement (Shift) and the Density correlation (Table 21.2.3.3), is to identify problem regions in a model. In order to do this effectively, it is necessary to evaluate the degree of redundancy between these measures and to establish the standard ranges for their values. The latter task, in particular, is not straightforward since it depends crucially on the quality of the experimental data and biases introduced by the scaling procedure and refinement protocol. In this regard, several issues are presently still under investigation.
A preliminary investigation of the mutual relations between the above-mentioned local measures has been performed in several protein and nucleic acid structures taken individually. This shows that that the B factor is strongly correlated with the density index, as illustrated in Fig. 21.2.3.5(a), and to a lesser extent with the atomic displacement (Fig. 21.2.3.5b). A weaker correlation was detected between the latter three measures and the residue density correlation (data not shown).
Analyses across structures could, in principle, be carried out for all four local measures computed by SFCHECK, provided these measures are not subject to systematic biases due to differences in scaling procedures and refinement practices. Such biases are, however, well known for the B factors of individual atoms or residues. This is illustrated in Fig. 21.2.3.6(a). This figure plots, side-by-side, the average residue B factors in 21 protein structures determined at different d spacings. It shows that for proteins determined at poorer resolution (d spacing above 2 Å), the B factors of different structures are systematically shifted relative to one another. Such systematic shifts are much smaller for structures determined at 2 Å resolution or better (Fig. 21.2.3.6a). This is not surprising, since in lower-resolution structures, is often too low (< 4) to yield meaningful values for the B factors.
Interestingly, the residue Density index, a very different parameter from the B factor, which measures the level of electron density at the atomic positions, does not display the systematic shifts observed for the B factors (Fig. 21.2.3.6b), despite the fact that the two measures are rather strongly correlated in individual structures. An indicator such as this one, and ultimately the atomic s.u.'s themselves, should be better suited for analysing and comparing the trends in the quality of specific regions of the model across different structures.
References
Brünger, A. T. (1992b). Free R value: a novel statistical quantity for assessing the accuracy of crystal structures. Nature (London), 355, 472–474.Google ScholarCruickshank, D. W. J. (1949). The accuracy of electron-density maps in X-ray analysis with special reference to dibenzyl. Acta Cryst. 2, 65–82.Google Scholar
Laskowski, R. A., MacArthur, M. W., Moss, D. S. & Thornton, J. M. (1993). PROCHECK: a program to check the stereochemical quality of protein structures. J. Appl. Cryst. 26, 283–291.Google Scholar
Murshudov, G. N., Davies, G. J., Isupov, M., Krzywda, S. & Dodson, E. J. (1998). The effect of overall anisotropic scaling in macromolecular refinement. Newsletter on protein crystallography, pp. 37–42. Warrington: Daresbury Laboratory.Google Scholar
Sheriff, S. & Hendrickson, W. A. (1987). Description of overall anisotropy in diffraction from macromolecular crystals. Acta Cryst. A43, 118–121.Google Scholar
Vaguine, A. A., Richelle, J. & Wodak, S. J. (1999). SFCHECK: a unified set of procedures for evaluating the quality of macromolecular structure-factor data and their agreement with the atomic model. Acta Cryst. D55, 191–205.Google Scholar
Walter, R. L., Ealick, S. E., Friedman, A. M., Blake, R. C. II, Proctor, P. & Shoham, M. (1996). Multiple wavelength anomalous diffraction (MAD) crystal structure of rusticyanin: a highly oxidizing cupredoxin with extreme acid stability. J. Mol. Biol. 263, 730–749.Google Scholar