International
Tables for
Crystallography
Volume F
Crystallography of biological macromolecules
Edited by M. G. Rossmann and E. Arnold

International Tables for Crystallography (2006). Vol. F. ch. 22.4, pp. 560-561   | 1 | 2 |

Section 22.4.4.2. Conformational information

F. H. Allen,a* J. C. Colea and M. L. Verdonka

aCambridge Crystallographic Data Centre, 12 Union Road, Cambridge CB2 1EZ, England
Correspondence e-mail:  allen@ccdc.cam.ac.uk

22.4.4.2. Conformational information

| top | pdf |

Torsion angles are the natural measures of conformational relationships within molecules. If we specify a chemical substructure involving a central bond of interest, then the CSD system will display the distribution of torsion angles about that bond, computed from the tens, hundreds, or even thousands of instances located in the database. Examination of these univariate distributions will reveal any conformational preferences that may exist in small-molecule crystal structures. This approach is illustrated by the histogram of Fig. 22.4.4.1[link], which shows the torsional distribution about S—S bridge bonds in C(sp 3)—S—S—C(sp 3) substructures located in the CSD. Clearly, there is a preference for a perpendicular conformation in the CS—SC unit. This corresponds well with values observed for cysteine bridges in protein structures, and with theoretical calculations on small model compounds.

[Figure 22.4.4.1]

Figure 22.4.4.1| top | pdf |

Distribution of torsion angles in C(sp3)—S—S—C(sp3) substructures located in the CSD.

The interrelationship between two torsion angles can be visualized by plotting them against each other on a conventional 2D scattergram. In the small-molecule area, the distribution of data points in these scattergrams can reveal conformational interconversion pathways (Rappoport et al., 1990[link]) or show areas of high data density corresponding to conformational preferences (Schweizer & Dunitz, 1982[link]). The best known bivariate distribution is the Ramachandran plot of peptidic φ–ψ angles, which is universally used to assess the quality of protein structures and to identify structural features. Ashida et al. (1987[link]) performed an extensive analysis of peptide conformations available in the CSD and present torsional histograms, a Ramachandran plot, and a variety of other visual and descriptive statistics that summarize this data set.

It is often necessary to use three or more torsion angles to define the conformation of, e.g., a side chain or flexible ring. Here, multivariate statistical techniques (Chatfield & Collins, 1980[link]; Taylor, 1986[link]) have proved valuable for extracting information from the matrix T(N, k) that contains the k torsion angles computed for each of the N examples of the substructure in the CSD. Two methods, both available within the CSD system software described in Chapter 24.3[link] , are commonly used to visualize the k-dimensional data set and to locate natural sub-groupings of data points within it.

Principal component analysis (PCA) (Murray-Rust & Motherwell, 1978[link]; Allen, Doyle & Auf der Heyde, 1991[link], Allen, Howard & Pitchford, 1996[link]) is a dimension-reduction technique which analyses the variance in T(N, k) in terms of a new set of uncorrelated, orthogonal variables: the principal components, or PCs. The PCs are generated in decreasing order of the percentage of the variance that is explained by each of them. The hope is that the number of PCs, p, that explains most of the variance in the data set is such that [p\ll k], so that a few pairwise scatter plots with respect to the new PC axes will provide useful visualizations of the complete data set. For cyclic fragments, PCA results are closely related to those obtained using the ring-puckering methodology of Cremer & Pople (1975[link]). Cluster analysis (CA) (Everitt, 1980[link]; Allen, Doyle & Taylor, 1991[link]) is a purely numerical method that attempts to locate discrete groupings of data points within a multivariate data set. CA uses `distances' or `dissimilarities' between pairs of points in a k-dimensional space as its working basis, and a very large number of clustering algorithms exist. The mathematical basis of both of these techniques, the modifications that are needed to account for topological symmetry in the search fragment and examples of their application have been reviewed by Taylor & Allen (1994[link]).

Preliminary work using the concepts of machine learning (Carbonell, 1989[link]) for knowledge discovery and classification have also been carried out using the CSD (see e.g. Allen et al., 1990[link]; Fortier et al., 1993[link]). In particular, conceptual clustering methods have been applied to a number of substructures (Conklin et al., 1996[link]) and the results compared with those obtained by the statistical and numerical methods described above. Similar techniques are also being used for the classification of protein structures (see e.g. Blundell et al., 1987[link]).

References

First citation Allen, F. H., Doyle, M. J. & Auf der Heyde, T. P. E. (1991). Automated conformational analysis from crystallographic data. 6. Principal-component analysis for n-membered carbocyclic rings (n = 4, 5, 6): symmetry considerations and correlations with ring-puckering parameters. Acta Cryst. B47, 412–424.Google Scholar
First citation Allen, F. H., Doyle, M. J. & Taylor, R. (1991). Automated conformational analysis from crystallographic data. 3. Three-dimensional pattern recognition within the Cambridge Structural Database system: implementation and practical examples. Acta Cryst. B47, 50–61.Google Scholar
First citation Allen, F. H., Howard, J. A. K. & Pitchford, N. A. (1996). Symmetry-modified conformational mapping and classification of the medium rings from crystallographic data. IV. Cyclooctane and related eight-membered rings. Acta Cryst. B52, 882–891.Google Scholar
First citation Allen, F. H., Rowland, R. S., Fortier, S. & Glasgow, J. I. (1990). Knowledge acquisition from crystallographic databases: towards a knowledge-based approach to molecular scene analysis. Tetrahedron Comput. Methodol. 3, 757–774.Google Scholar
First citation Ashida, T., Tsunogae, Y., Tanaka, I. & Yamane, T. (1987). Peptide chain structure parameters, bond angles and conformation angles from the Cambridge Structural Database. Acta Cryst. B43, 212–218.Google Scholar
First citation Blundell, T. L., Sibanda, B. L., Sternberg, M. J. E. & Thornton, J. M. (1987). Knowledge-based prediction of protein structures and the design of novel molecules. Nature (London), 326, 347–352.Google Scholar
First citation Carbonell, J. (1989). Editor. Machine learning – paradigms and methods. Amsterdam: Elsevier.Google Scholar
First citation Chatfield, C. & Collins, A. J. (1980). Introduction to multivariate analysis. London: Chapman & Hall.Google Scholar
First citation Conklin, D., Fortier, S., Glasgow, J. I. & Allen, F. H. (1996). Conformational analysis from crystallographic data using conceptual clustering. Acta Cryst. B52, 535–549.Google Scholar
First citation Cremer, D. & Pople, J. A. (1975). A general definition of ring puckering coordinates. J. Am. Chem. Soc. 97, 1354–1358.Google Scholar
First citation Everitt, B. (1980). Cluster analysis. New York: Wiley.Google Scholar
First citation Fortier, S., Castleden, I., Glasgow, J., Conklin, D., Walmsley, C., Leherte, L. & Allen, F. H. (1993). Molecular scene analysis: the integration of direct-methods and artificial-intelligence strategies for solving protein crystal structures. Acta Cryst. D49, 168–178.Google Scholar
First citation Murray-Rust, P. & Bland, R. (1978). Computer retrieval and analysis of molecular geometry. II. Variance and its interpretation. Acta Cryst. B34, 2527–2533.Google Scholar
First citation Rappoport, Z., Biali, S. E. & Kaftory, M. (1990). Application of the structure correlation method to ring-flip processes in benzophenone. J. Am. Chem. Soc. 112, 7742–7750.Google Scholar
First citation Schweizer, W. B. & Dunitz, J. D. (1982). Structural characteristics of the carboxylic acid ester group. Helv. Chim. Acta, 65, 1547–1552.Google Scholar
First citation Taylor, R. (1986). The Cambridge Structural Database in molecular graphics: techniques for the rapid identification of conformational minima. J. Mol. Graphics, 4, 123–131.Google Scholar
First citation Taylor, R. & Allen, F. H. (1994). Statistical and numerical methods of data analysis. In Structure correlation, edited by H.-B. Bürgi & J. D. Dunitz. Weinheim: VCH Publishers.Google Scholar








































to end of page
to top of page