International
Tables for
Crystallography
Volume F
Crystallography of biological macromolecules
Edited by M. G. Rossmann and E. Arnold

International Tables for Crystallography (2006). Vol. F. ch. 21.2, pp. 507-508   | 1 | 2 |

Section 21.2.2.1. Comparisons against standard values derived from crystals of small molecules

S. J. Wodak,a* A. A. Vagin,b J. Richelle,b U. Das,b J. Pontiusb and H. M. Bermanc

aUnité de Conformation de Macromolécules Biologiques, Université Libre de Bruxelles, avenue F. D. Roosevelt 50, CP160/16, B-1050 Bruxelles, Belgium, and EMBL–EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, England, bUnité de Conformation de Macromolécules Biologiques, Université Libre de Bruxelles, avenue F. D. Roosevelt 50, CP160/16, B-1050 Bruxelles, Belgium, and  cDepartment of Chemistry, Rutgers University, 610 Taylor Road, Piscataway, NJ 08854-8087, USA
Correspondence e-mail:  shosh@ucmb.ulb.ac.be

21.2.2.1. Comparisons against standard values derived from crystals of small molecules

| top | pdf |

This concerns the validation of the covalent geometry of the atomic model. It involves comparing the bond distances and angles of the macromolecule against standard values and their associated uncertainties, derived from crystal structures of small organic molecules available in the Cambridge Structural Database, CSD (Allen et al., 1979[link], 1983[link]).

The standard values derived in this way are also used as restraints in crystallographic refinement programs, such as XPLOR (Brünger, 1992a[link]) or the CCP4 suite of programs (Collaborative Computational Project, Number 4, 1994[link]). As a result, the bond distances and angles of the final model usually agree well with their standard values, and the degree of scatter merely reflects the relative weight imposed on the various terms of the target function during refinement.

For proteins, the most commonly used standard values for the bond distances and angles are those compiled by Engh & Huber (1991)[link] from molecular fragments in the CSD that most closely resemble chemical groups in amino acids. These parameters were shown to yield an improved description over that provided by the param19x.pro used in XPLOR, especially for the covalent geometry of aromatic rings in side-chain groups. It is noteworthy that these CSD-derived bond distances and angles can differ significantly from those used in molecular dynamics force fields, such as that of a recent version of CHARMM (MacKerell et al., 1998[link]). In these force fields, covalent-geometry parameters are obtained by a different strategy. They are optimized together with non-bonded parameters against a large body of available energy and structural data for a limited set of compounds representing amino-acid building blocks.

Protein-structure validation packages, such as PROCHECK (Laskowski et al., 1993[link]) and WHAT IF (Hooft, Vriend et al., 1996[link]), flag all bond distances and angles that deviate significantly from the database-derived reference values. This includes analysis of the deviations from planarity in aromatic rings and planar side-chain groups.

Similar checks are performed for the covalent geometry of atomic models of RNA or DNA oligo- and polynucleotides. Here, standard ranges for bond distances and angles are derived from crystal structures of nucleic acid bases, mononucleosides and mononucleotides in the CSD (Clowney et al., 1996[link]; Gelbin et al., 1996[link]). These values are used in validation procedures developed by the Nucleic Acid Database (NDB) (Berman et al., 1992[link]) and in crystallographic refinement programs. For higher-resolution structures (better than 2.4 Å), a standard geometry, dependent on the sugar pucker conformation (C2′ endo or C3′ endo) (Parkinson et al., 1996[link]), is used.

Validation of the covalent geometry of the so-called `hetero groups' (chemically modified monomer groups or small molecules that bind to macromolecules) is much more difficult. It therefore tends not to be routinely performed, and, as a result, the quality of the hetero groups in the models deposited in the Protein Data Bank (PDB) (Bernstein et al., 1977[link]; Berman et al., 2000[link]) varies widely.

The variety of the chemical structures of these molecules (the current release of the PDB contains about 2700 chemically distinct compounds) makes it difficult to archive them consistently, let alone to compile the dictionaries containing the required reference values in advance. Proper handling and verification of such groups require a comprehensive and rigorous description of the chemical components, as well as flexible means of deriving the appropriate reference geometries.

The development of systematic procedures for checking bond lengths and various torsion angles of hetero groups (Kleywegt & Jones, 1998[link]) is a step in the right direction. Further progress should come, thanks in part to the recently adopted macromolecular Crystallographic Information File (mmCIF) format (Bourne et al., 1997[link]), which provides the necessary framework for a much more comprehensive and rigorous description of the molecular components. Using this description as the basis, automated tools for building `customized' dictionaries of geometrical standards have been developed. One such tool is A LigAnd and Monomer Object Data Environment (A LA MODE ) (Clowney et al., 1999[link]). It starts from a minimal topological description of a ligand or monomer component and performs the tasks required to construct the mmCIF component description. This includes querying the CSD, integration and book-keeping of database survey results, analysis and comparison of covalent geometry and stereochemistry, and the assembly of complex model structures from the results of multiple database surveys. Tools such as this considerably simplify the handling of small molecules at the refinement, validation and archiving stages.

References

First citation Collaborative Computational Project, Number 4 (1994). The CCP4 suite: programs for protein crystallography. Acta Cryst. D50, 760–763.Google Scholar
First citation Allen, F. H., Bellard, S., Brice, M. D., Cartwright, B. A., Doubleday, A., Higgs, H., Hummelink, T., Hummelink-Peters, B. G., Kennard, O., Motherwell, W. D. S., Rodgers, J. R. & Watson, D. G. (1979). The Cambridge Crystallographic Data Centre: computer-based search, retrieval, analysis and display of information. Acta Cryst. B35, 2331–2339.Google Scholar
First citation Allen, F. H., Kennard, O. & Taylor, R. (1983). Systematic analysis of structural data as a research technique in organic chemistry. Acc. Chem. Res. 16, 146–153.Google Scholar
First citation Berman, H. M., Olson, W. K., Beveridge, D. L., Westbrook, J., Gelbin, A., Demeny, T., Hsieh, S.-H., Srinivasan, A. R. & Schneider, B. (1992). The Nucleic Acid Database. A comprehensive relational database of three-dimensional structures of nucleic acids. Biophys. J. 63, 751–759.Google Scholar
First citation Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N. & Bourne, P. E. (2000). The Protein Data Bank. Nucleic Acids Res. 28, 235–242.Google Scholar
First citation Bernstein, F. C., Koetzle, T. F., Williams, G. J. B., Meyer, E. F. Jr, Brice, M. D., Rodgers, J. R., Kennard, O., Shimanouchi, T. & Tasumi, M. (1977). The Protein Data Bank: a computer-based archival file for macromolecular structures. J. Mol. Biol. 112, 535–542.Google Scholar
First citation Bourne, P. E., Berman, H. M., McMahon, B., Watenpaugh, K., Westbrook, J. & Fitzgerald, P. M. D. (1997). Macromolecular Crystallographic Information File. Methods Enzymol. 277, 571–590.Google Scholar
First citation Brünger, A. T. (1992a). X-PLOR manual. Version 3.1. New Haven: Yale University Press.Google Scholar
First citation Clowney, L., Jain, S. C., Srinivasan, A. R., Westbrook, J., Olson, W. K. & Berman, H. M. (1996). Geometric parameters in nucleic acids: nitrogenous bases. J. Am. Chem. Soc. 118, 509–518.Google Scholar
First citation Clowney, L., Westbrook, J. D. & Berman, H. M. (1999). CIF applications. XI. A La Mode: a ligand and monomer object data environment. I. Automated construction of mmCIF monomer and ligand models. J. Appl. Cryst. 32, 125–133.Google Scholar
First citation Engh, R. A. & Huber, R. (1991). Accurate bond and angle parameters for X-ray protein structure refinement. Acta Cryst. A47, 392–400.Google Scholar
First citation Gelbin, A., Schneider, B., Clowney, L., Hsieh, S.-H., Olson, W. K. & Berman, H. M. (1996). Geometric parameters in nucleic acids: sugar and phosphate constituents. J. Am. Chem. Soc. 118, 519–529.Google Scholar
First citation Hooft, R. W. W., Vriend, G., Sander, C. & Abola, E. E. (1996). Errors in protein structures. Nature (London), 381, 272.Google Scholar
First citation Kleywegt, G. J. & Jones, T. A. (1998). Databases in protein crystallography. Acta Cryst. D54, 1119–1131.Google Scholar
First citation Laskowski, R. A., MacArthur, M. W., Moss, D. S. & Thornton, J. M. (1993). PROCHECK: a program to check the stereochemical quality of protein structures. J. Appl. Cryst. 26, 283–291.Google Scholar
First citation MacKerell, A. D. Jr, Bashford, D., Bellott, M., Dunbrack, R. L. Jr, Evanseck, J. D., Field, M. J., Fischer, S., Gao, J., Guo, H., Ha, S., Joseph-McCarthy, D., Kuchnir, L., Kuczera, K., Lau, F. T. K., Mattos, C., Michnick, S., Ngo, T., Nguyen, D. T., Prodhom, B., Reiher, W. E. III, Roux, B., Schlenkrich, M., Smith, J. C., Stote, R., Straub, J., Watanabe, M., Wiórkiewicz-Kuczera, J., Yin, D. & Karplus, M. (1998). All-atom empirical potential for molecular modeling and dynamics studies of proteins. J. Phys. Chem. B, 102, 3586–3616.Google Scholar
First citation Parkinson, G., Vojtechovsky, J., Clowney, L., Brünger, A. T. & Berman, H. M. (1996). New parameters for the refinement of nucleic acid-containing structures. Acta Cryst. D52, 57–64.Google Scholar








































to end of page
to top of page