International
Tables for
Crystallography
Volume F
Crystallography of biological macromolecules
Edited by M. G. Rossmann and E. Arnold

International Tables for Crystallography (2006). Vol. F. ch. 21.3, pp. 521-525   | 1 | 2 |

Section 21.3.5. Examples: detection of errors in structures

O. Dym,a D. Eisenbergb* and T. O. Yeatesc

aUCLA–DOE Laboratory of Structural Biology and Molecular Medicine, UCLA, Box 951570, Los Angeles, CA 90095-1570, USA, bUCLA–DOE Laboratory of Structural Biology and Molecular Medicine, Department of Chemistry & Biochemistry, Molecular Biology Institute and Department of Biological Chemistry, UCLA, Los Angeles, CA 90095-1570, USA, and  cUCLA–DOE Laboratory of Structural Biology and Molecular Medicine, Department of Chemistry & Biochemistry and Molecular Biology Institute, UCLA, Los Angeles, CA 90095-1569, USA
Correspondence e-mail:  david@mbi.ucla.edu

21.3.5. Examples: detection of errors in structures

| top | pdf |

21.3.5.1. Specific examples

| top | pdf |

Several examples are presented of errors in structural models determined by X-ray crystallography that can be detected using validation methods. One is that of the small subunit of ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO), which was traced essentially backwards from a poor electron-density map (Chapman et al., 1988[link]). The program ERRAT finds that approximately 40% of the residues in this mistraced model are outside the 95% confidence limit (Fig. 21.3.5.1a[link]). This limit is the error value above which a given region can be judged to be erroneous with 95% certainty, so a reliable model should exceed this value over less than 5% of its length. The final model of RuBisCO (Curmi et al., 1992[link]) shows only 2% of the residues outside ERRAT's 95% confidence limit. Similarly, the 3D profile calculated from VERIFY3D for the erroneous model (Fig. 21.3.5.1b[link]) gives a total score of 15 when matched to the sequence of the small subunit of RuBisCO. This score is well below the expected value of 58 for the correct structure of this length. Indeed, the 3D profile of the correct model (Curmi et al., 1992[link]) (Fig. 21.3.5.1b[link]) of RuBisCO has a score of 55. PROCHECK and WHAT IF also identify stereochemistry problems in the original model, including deviant bond angles and bond lengths, many residues in the disallowed Ramachandran regions (Fig. 21.3.5.1c[link]), bad peptide-bond planarity, and bad non-bonded interactions. In contrast, most amino-acid residues of the correct RuBisCO model are in the allowed regions of the Ramachandran plot (Fig. 21.3.5.1d[link]) with good overall geometry.

[Figure 21.3.5.1]

Figure 21.3.5.1| top | pdf |

Detection of errors in the small subunit of ribulose-1,5-bisphospate carboxylase/oxygenase (RuBisCO). (a) ERRAT plot of the error function in a nine-residue sliding window, the centre of which is at the sequence position indicated by the horizontal axis. The solid bold line represents the revised structure and the dashed line the original structure. The thin solid lines indicate the 95% and 99% confidence limits for rejection. A region above the 95% line can be judged incorrect with 95% certainty. (b) VERIFY3D profile-window plots for the revised (bold) and original (dashed) models. The vertical axis gives the average 3D–1D score for residues within a 21-residue sliding window. Regions that score below zero are suspect. (c) Ramachandran diagram from PROCHECK of the initial structure of RuBisCO. The main-chain dihedral angle φ (N—Cα bond) is plotted versus ψ (Cα—C bond). All non-glycine residues outside the allowed regions are marked. (d) Ramachandran plot for the refined RuBisCO structure.

The archive of obsolete PDB entries maintained by the San Diego Supercomputer group (http://pdbobs.sdsc.edu ) includes old versions of protein structures that have been withdrawn and/or replaced by the depositor with a newer version. One example is that of a protein (3xia.coor) originally solved to 3 Å in the wrong space group and later to 1.8 Å in the correct space group (1xya.coor). The ERRAT program reveals problems in the original model, with 45% of the residues outside the 95% confidence limit (Fig. 21.3.5.2a[link]). The more recent model has only 1.5% of the residues outside the 95% confidence limit. The problem in the original model is also illustrated by the VERIFY3D plot (Fig. 21.3.5.2b[link]) for which the average score is often below the value of 0.1 and dips below zero at four points. In contrast, the VERIFY3D plot of the revised model shows no dips below zero. Poor stereochemistry is also apparent in the Ramachandran plot of the original model (Fig. 21.3.5.2c[link]). Only 38% of the backbone dihedral angles lie in the most favoured regions, compared to 93.8% in the revised model (Fig. 21.3.5.2d[link]).

[Figure 21.3.5.2]

Figure 21.3.5.2| top | pdf |

The detection of model errors due to refinement in an incorrect space group: an example (3xia.coor) from the archive of obsolete PDB entries. (a) ERRAT plot of the error function in a nine-residue sliding window. The solid bold line represents the revised structure and the dashed line represents the original structure. The thin solid lines indicate the 95% and 99% confidence limits for rejection. (b) VERIFY3D profile-window plots for the revised (bold) and original (dashed) models. The vertical axis gives the average 3D–1D score for residues within a 21-residue sliding window. (c) Ramachandran diagram of the original structure. All non-glycine residues outside the allowed regions are marked. (d) Ramachandran diagram for the revised structure.

The potential usefulness of error-detecting programs during model building is suggested by stages in the crystal structure determination of triacylglycerol lipase from Pseudomonas cepacia (Kim et al., 1997[link]), which was solved by MIR. The authors kindly provided us with ten different models (assigned as stage number 1–10) along the course of model building and refinement. Regions where Cα positions shifted between initial and final models correlated with regions where the error functions improved. For example, the program ERRAT points at specific regions (e.g. 18–35 and 135–165) originally assigned as polyalanine. When at the next stage of refinement these were changed to the actual amino-acid sequence, these regions behaved normally (Fig. 21.3.5.3a[link]). This illustrates that ERRAT is able to illuminate problem areas in a structure.

[Figure 21.3.5.3]

Figure 21.3.5.3| top | pdf |

The usefulness of validation programs during model building is suggested by the example of the triacylglycerol lipase from Pseudomonas cepacia at different stages of atomic refinement. (a) Plot from ERRAT at the initial and final stages of refinement. (b) VERIFY3D profile-window plots of the final model. The dashed line represents symmetry molecule number 1 (residues 1–320) and symmetry molecule number 2 (residues 321–640) when not in contact with each other. The solid line represents symmetry molecule 1 and 2 when in contact. This plot illustrates that the state of oligomerization can affect the 3D profile plot, giving information on the oligomerization. See Bennett et al. (1994[link]) for more details.

VERIFY3D is sensitive to unusual environments in proteins. An illustration is offered by the structures of lipases, with and without their inhibitors. There are two general conformations known as `closed' and `open'. In the so-called `closed' structure, the catalytic triad is buried underneath a helical segment, called a `lid' (Brzozowski et al., 1991[link]), so that hydrophobic residues tend to be buried as observed in a `normal' 3D profile. In the `open' conformation, the lipid binding site becomes accessible to the solvent, and hydrophobic surfaces (residues 140–150 and 230–250) are exposed by the movement of the `lid'. These hydrophobic exposed regions are strikingly shown in the 3D profile of the `open' structure (Fig. 21.3.5.3b[link]), which clearly reveals the two problematic regions (140–150 and 230–250) with profile scores below zero. The exposed hydrophobic residues 140–150 from one symmetry model make van der Waals interactions with hydrophobic residues 230–250 from a symmetry-related molecule (Kim et al., 1997[link]). These interactions are revealed as higher scores in those regions when inspecting the 3D profiles of the two symmetry-related molecules.

Another example of unusual environment is that of diphtheria toxin (DT), which exists as a monomer as well as a dimer. Monomeric DT is a Y-shaped molecule with three domains known as catalytic (C), transmembrane (T) and receptor binding domain (R). Crystal structures have been determined for both the `closed' monomeric form and for a domain-swapped dimeric form (Bennett et al., 1994[link]). Upon dimerization, a massive conformational rearrangement occurs and the entire R domain from each monomer of the dimer is interchanged with the other monomer. This involves breaking the noncovalent interactions between the R domain and the C and T domains and rotating the R domain by 180° with atomic movements up to 65 Å to produce the `open' conformation. After rearrangement, each R domain reforms the same noncovalent interactions as it had in the monomer, but with the C and T domains of the other monomer. The existence of both open and closed forms of DT requires that large conformational changes occur in residues 379–387 (the hinge loop). The 3D profile of the `open' form (Fig. 21.3.5.4a[link]) shows low scores for these residues compared to the closed monomer or dimer (Fig. 21.3.5.4b[link]). The higher scores of the open monomer are consistent with the greater stability of the monomer in the closed rather than the open conformation.

[Figure 21.3.5.4]

Figure 21.3.5.4| top | pdf |

VERIFY3D profile plots of diphtheria toxin (DT) in three forms: open and closed monomers and the dimer. (a) DT open monomer (dashed), DT dimer (solid line). (b) DT closed monomer (dashed) and dimer (solid line). Notice that the hinge loop (residues 379–387) in the open monomer has a low profile score, and this structure is known to be unstable. The score is raised in the stable closed monomer and in the dimer.

21.3.5.2. Survey of old and revised structures

| top | pdf |

The past two decades have seen a surge of development in the experimental techniques of crystal structure determination. As a consequence, many structures originally solved at low resolution were later determined at higher resolution, often starting with improved phases. The archive of obsolete PDB entries maintained by the San Diego Supercomputer group (http://pdbobs.sdsc.edu ) served as a benchmark for evaluating the ERRAT program. For testing, 17 pairs of protein models were selected. Each pair comprised an obsolete entry and the revised model that replaced it. Using ERRAT, the overall quality of each model was expressed as a single number according to the fraction of the structure falling below the 95% confidence limit for rejection. The overall scores are significantly better for the revised structures, most of which were analysed at improved resolution (Fig. 21.3.5.5a[link]). This result further demonstrates the utility of ERRAT for monitoring the model-building process. Furthermore, a strong correlation is found between the percentage of residues within the 95% confidence limit given by ERRAT and the percentage of residues in the most favoured regions of the Ramachandran plot of PROCHECK (Fig. 21.3.5.5b[link]). In general, the problematic regions detected by the two programs agree with each other.

[Figure 21.3.5.5]

Figure 21.3.5.5| top | pdf |

Evaluation of old and revised models in a database survey by ERRAT. (a) Percentage of residues within the 95% confidence limit given by ERRAT as a function of the resolution to which the structure was determined. Each arrow represents an obsolete structure (arrow tail) and the revised structure that replaced it (arrow head). The revised structures (typically analysed at finer resolution) show markedly improved ERRAT scores. (b) Correlation between the percentage of a structure within the 95% confidence limit according to ERRAT, and the percentage of residues in the most favoured regions of a Ramachandran diagram according to PROCHECK.

References

First citation Bennett, M. J., Choe, S. & Eisenberg, D. (1994). Domain swapping: entangling alliances between proteins. Proc. Natl Acad. Sci. USA, 91, 3127–3131.Google Scholar
First citation Brzozowski, A. M., Derewenda, U., Derewenda, Z. S., Dodson, G. G., Lawson, D. M., Turkenburg, J. P., Bjorkling, F., Huge-Jensen, B., Patkar, S. A. & Thim, L. (1991). A model for interfacial activation in lipases from the structure of fungal lipase-inhibitor complex. Nature (London), 351, 491–494.Google Scholar
First citation Chapman, M. S., Suh, S. W., Curmi, P. M., Cascio, D., Smith, W. W. & Eisenberg, D. (1988). Tertiary structure of plant RuBisCO: domains and their contacts. Science, 241, 71–74.Google Scholar
First citation Curmi, P. M. G., Cascio, D., Sweet, R. M., Eisenberg, D. & Schreuder, H. (1992). Crystal structure of the unactivated form of ribulose-1,5 bisphosphate carboxylase/oxygenase from tobacco refined at 2.0 Å resolution. J. Biol. Chem. 267, 16980–16989.Google Scholar
First citation Kim, K. K., Song, H. K., Shin, D. H., Hwang, K. Y. & Suh, S. W. (1997). The crystal structure of a triacylglycerol lipase from Pseudomonas cepacia reveals a highly open conformation in the absence of a bound inhibitor. Structure, 5, 173–185.Google Scholar








































to end of page
to top of page