International
Tables for Crystallography Volume F Crystallography of biological macromolecules Edited by M. G. Rossmann and E. Arnold © International Union of Crystallography 2006 |
International Tables for Crystallography (2006). Vol. F. ch. 21.3, pp. 521-525
Section 21.3.5. Examples: detection of errors in structures
aUCLA–DOE Laboratory of Structural Biology and Molecular Medicine, UCLA, Box 951570, Los Angeles, CA 90095-1570, USA, bUCLA–DOE Laboratory of Structural Biology and Molecular Medicine, Department of Chemistry & Biochemistry, Molecular Biology Institute and Department of Biological Chemistry, UCLA, Los Angeles, CA 90095-1570, USA, and cUCLA–DOE Laboratory of Structural Biology and Molecular Medicine, Department of Chemistry & Biochemistry and Molecular Biology Institute, UCLA, Los Angeles, CA 90095-1569, USA |
Several examples are presented of errors in structural models determined by X-ray crystallography that can be detected using validation methods. One is that of the small subunit of ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO), which was traced essentially backwards from a poor electron-density map (Chapman et al., 1988). The program ERRAT finds that approximately 40% of the residues in this mistraced model are outside the 95% confidence limit (Fig. 21.3.5.1a). This limit is the error value above which a given region can be judged to be erroneous with 95% certainty, so a reliable model should exceed this value over less than 5% of its length. The final model of RuBisCO (Curmi et al., 1992) shows only 2% of the residues outside ERRAT's 95% confidence limit. Similarly, the 3D profile calculated from VERIFY3D for the erroneous model (Fig. 21.3.5.1b) gives a total score of 15 when matched to the sequence of the small subunit of RuBisCO. This score is well below the expected value of 58 for the correct structure of this length. Indeed, the 3D profile of the correct model (Curmi et al., 1992) (Fig. 21.3.5.1b) of RuBisCO has a score of 55. PROCHECK and WHAT IF also identify stereochemistry problems in the original model, including deviant bond angles and bond lengths, many residues in the disallowed Ramachandran regions (Fig. 21.3.5.1c), bad peptide-bond planarity, and bad non-bonded interactions. In contrast, most amino-acid residues of the correct RuBisCO model are in the allowed regions of the Ramachandran plot (Fig. 21.3.5.1d) with good overall geometry.
The archive of obsolete PDB entries maintained by the San Diego Supercomputer group (http://pdbobs.sdsc.edu ) includes old versions of protein structures that have been withdrawn and/or replaced by the depositor with a newer version. One example is that of a protein (3xia.coor) originally solved to 3 Å in the wrong space group and later to 1.8 Å in the correct space group (1xya.coor). The ERRAT program reveals problems in the original model, with 45% of the residues outside the 95% confidence limit (Fig. 21.3.5.2a). The more recent model has only 1.5% of the residues outside the 95% confidence limit. The problem in the original model is also illustrated by the VERIFY3D plot (Fig. 21.3.5.2b) for which the average score is often below the value of 0.1 and dips below zero at four points. In contrast, the VERIFY3D plot of the revised model shows no dips below zero. Poor stereochemistry is also apparent in the Ramachandran plot of the original model (Fig. 21.3.5.2c). Only 38% of the backbone dihedral angles lie in the most favoured regions, compared to 93.8% in the revised model (Fig. 21.3.5.2d).
The potential usefulness of error-detecting programs during model building is suggested by stages in the crystal structure determination of triacylglycerol lipase from Pseudomonas cepacia (Kim et al., 1997), which was solved by MIR. The authors kindly provided us with ten different models (assigned as stage number 1–10) along the course of model building and refinement. Regions where Cα positions shifted between initial and final models correlated with regions where the error functions improved. For example, the program ERRAT points at specific regions (e.g. 18–35 and 135–165) originally assigned as polyalanine. When at the next stage of refinement these were changed to the actual amino-acid sequence, these regions behaved normally (Fig. 21.3.5.3a). This illustrates that ERRAT is able to illuminate problem areas in a structure.
VERIFY3D is sensitive to unusual environments in proteins. An illustration is offered by the structures of lipases, with and without their inhibitors. There are two general conformations known as `closed' and `open'. In the so-called `closed' structure, the catalytic triad is buried underneath a helical segment, called a `lid' (Brzozowski et al., 1991), so that hydrophobic residues tend to be buried as observed in a `normal' 3D profile. In the `open' conformation, the lipid binding site becomes accessible to the solvent, and hydrophobic surfaces (residues 140–150 and 230–250) are exposed by the movement of the `lid'. These hydrophobic exposed regions are strikingly shown in the 3D profile of the `open' structure (Fig. 21.3.5.3b), which clearly reveals the two problematic regions (140–150 and 230–250) with profile scores below zero. The exposed hydrophobic residues 140–150 from one symmetry model make van der Waals interactions with hydrophobic residues 230–250 from a symmetry-related molecule (Kim et al., 1997). These interactions are revealed as higher scores in those regions when inspecting the 3D profiles of the two symmetry-related molecules.
Another example of unusual environment is that of diphtheria toxin (DT), which exists as a monomer as well as a dimer. Monomeric DT is a Y-shaped molecule with three domains known as catalytic (C), transmembrane (T) and receptor binding domain (R). Crystal structures have been determined for both the `closed' monomeric form and for a domain-swapped dimeric form (Bennett et al., 1994). Upon dimerization, a massive conformational rearrangement occurs and the entire R domain from each monomer of the dimer is interchanged with the other monomer. This involves breaking the noncovalent interactions between the R domain and the C and T domains and rotating the R domain by 180° with atomic movements up to 65 Å to produce the `open' conformation. After rearrangement, each R domain reforms the same noncovalent interactions as it had in the monomer, but with the C and T domains of the other monomer. The existence of both open and closed forms of DT requires that large conformational changes occur in residues 379–387 (the hinge loop). The 3D profile of the `open' form (Fig. 21.3.5.4a) shows low scores for these residues compared to the closed monomer or dimer (Fig. 21.3.5.4b). The higher scores of the open monomer are consistent with the greater stability of the monomer in the closed rather than the open conformation.
The past two decades have seen a surge of development in the experimental techniques of crystal structure determination. As a consequence, many structures originally solved at low resolution were later determined at higher resolution, often starting with improved phases. The archive of obsolete PDB entries maintained by the San Diego Supercomputer group (http://pdbobs.sdsc.edu ) served as a benchmark for evaluating the ERRAT program. For testing, 17 pairs of protein models were selected. Each pair comprised an obsolete entry and the revised model that replaced it. Using ERRAT, the overall quality of each model was expressed as a single number according to the fraction of the structure falling below the 95% confidence limit for rejection. The overall scores are significantly better for the revised structures, most of which were analysed at improved resolution (Fig. 21.3.5.5a). This result further demonstrates the utility of ERRAT for monitoring the model-building process. Furthermore, a strong correlation is found between the percentage of residues within the 95% confidence limit given by ERRAT and the percentage of residues in the most favoured regions of the Ramachandran plot of PROCHECK (Fig. 21.3.5.5b). In general, the problematic regions detected by the two programs agree with each other.
References
Bennett, M. J., Choe, S. & Eisenberg, D. (1994). Domain swapping: entangling alliances between proteins. Proc. Natl Acad. Sci. USA, 91, 3127–3131.Google ScholarBrzozowski, A. M., Derewenda, U., Derewenda, Z. S., Dodson, G. G., Lawson, D. M., Turkenburg, J. P., Bjorkling, F., Huge-Jensen, B., Patkar, S. A. & Thim, L. (1991). A model for interfacial activation in lipases from the structure of fungal lipase-inhibitor complex. Nature (London), 351, 491–494.Google Scholar
Chapman, M. S., Suh, S. W., Curmi, P. M., Cascio, D., Smith, W. W. & Eisenberg, D. (1988). Tertiary structure of plant RuBisCO: domains and their contacts. Science, 241, 71–74.Google Scholar
Curmi, P. M. G., Cascio, D., Sweet, R. M., Eisenberg, D. & Schreuder, H. (1992). Crystal structure of the unactivated form of ribulose-1,5 bisphosphate carboxylase/oxygenase from tobacco refined at 2.0 Å resolution. J. Biol. Chem. 267, 16980–16989.Google Scholar
Kim, K. K., Song, H. K., Shin, D. H., Hwang, K. Y. & Suh, S. W. (1997). The crystal structure of a triacylglycerol lipase from Pseudomonas cepacia reveals a highly open conformation in the absence of a bound inhibitor. Structure, 5, 173–185.Google Scholar