Tables for
Volume F
Crystallography of biological macromolecules
Edited by M. G. Rossmann and E. Arnold

International Tables for Crystallography (2006). Vol. F, ch. 24.3, pp. 667-668

Section 24.3.4. Knowledge engineering from the CSD

F. H. Allena* and V. J. Hoya

aCambridge Crystallographic Data Centre, 12 Union Road, Cambridge CB2 1EZ, England
Correspondence e-mail:

24.3.4. Knowledge engineering from the CSD

| top | pdf | Databases versus knowledge bases

| top | pdf |

As illustrated in earlier sections, the CSD represents a collection of primary data resulting from diffraction experiments on crystals of small molecules – in particular the fractional coordinates, space group and cell dimensions that define the 3D crystal and molecular structure. However, the user of the system is usually interested in structural knowledge – in the form of bond lengths, angles, intermolecular contact distances and other parameters – that can be synthesized from the raw data by the use of the CSD system software. Thus, each detailed analysis carried out using the CSD system represents an experiment in data mining, and considerable operational and intellectual effort is employed in performing such analyses.

At the present state of development of the field, three facts are apparent: (a) many data-mining activities centre around a set of standard geometrical data types that are essential for major applications, particularly in structural chemistry, molecular modelling and rational drug design; (b) the expertise required to carry out data-mining experiments is not inconsiderable and the time required can be lengthy; and (c) as the size of the CSD is increasing rapidly and any compilations of structural knowledge should be updated on a regular basis, the increasing database size makes this operation very time consuming for individual users.

These considerations indicate that access to CSD information should be at two levels, the raw-data level and the structural-knowledge level, and, since 1995, the CCDC has started to derive libraries of structural knowledge from the raw data content of the CSD. The first of these libraries – IsoStar: a library of information on intermolecular interactions (Bruno et al., 1997[link]) – is briefly summarized below. A second library, Mogul, containing bond lengths, valence angles and torsional distributions, is currently under development in mid-2000. Such a knowledge base has obvious applications in crystallography, structural chemistry and molecular biology, not least in providing precise geometrical parameters which can be used in 3D model building, structure refinement and reality checking of developing and refined structures. The scientific applications of non-bonded contact geometries and conformational (torsional) information are more fully discussed in Chapter 22.4[link] . IsoStar: a library of knowledge about intermolecular interactions

| top | pdf |

IsoStar (Bruno et al., 1997[link]) is based on experimental data, not only from the CSD but also from the PDB, and contains some theoretical results calculated using the ab initio intermolecular perturbation theory (IMPT) method of Hayes & Stone (1984[link]). The experimental data in the CSD and the PDB have been used to display interaction geometries involving central groups (A) and contact groups (B). CSD search results of the type exemplified above are transformed into an easily visualized form by overlaying the A moieties. This results in a 3D distribution (scatterplot) showing the experimental distribution of B around A. A web-browser front end permits rapid access to these scatterplots, which can be viewed in RasMol (Sayle, 1996[link]), interrogated interactively, converted into contoured surfaces etc.

Version 1.1 of IsoStar, released in October 1998, contains information on non-bonded interactions formed between 310 central groups and 45 contact groups. Version 1.1 contains over 12 000 scatterplots: 9 000 from the CSD and 3 000 from the PDB. IsoStar also reports results for 867 theoretical potential-energy minima calculated using the IMPT procedure. The library will be updated on a regular basis using automated software procedures developed at the CCDC.

Chapter 22.4[link] contains illustrative examples from IsoStar, together with a more complete description of the knowledge base and its applications.


Bruno, I. J., Cole, J. C., Lommerse, J. P. M., Rowland, R. S., Taylor, R. & Verdonk, M. L. (1997). IsoStar: a library of information about nonbonded interactions. J. Comput.-Aided Mol. Des. 11, 525–537.Google Scholar
Hayes, I. C. & Stone, A. J. (1984). An intermolecular perturbation theory for the region of moderate overlap. J. Mol. Phys. 53, 83–105.Google Scholar
Sayle, R. (1996). The RASMOL visualiser. Glaxo Wellcome Research, Stevenage, Hertfordshire, England.Google Scholar

to end of page
to top of page