International
Tables for
Crystallography
Volume F
Crystallography of biological macromolecules
Edited by M. G. Rossmann and E. Arnold

International Tables for Crystallography (2006). Vol. F. ch. 24.3, pp. 667-668

Section 24.3.4.1. Databases versus knowledge bases

F. H. Allena* and V. J. Hoya

a Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge CB2 1EZ, England
Correspondence e-mail:  allen@ccdc.cam.ac.uk

24.3.4.1. Databases versus knowledge bases

| top | pdf |

As illustrated in earlier sections, the CSD represents a collection of primary data resulting from diffraction experiments on crystals of small molecules – in particular the fractional coordinates, space group and cell dimensions that define the 3D crystal and molecular structure. However, the user of the system is usually interested in structural knowledge – in the form of bond lengths, angles, intermolecular contact distances and other parameters – that can be synthesized from the raw data by the use of the CSD system software. Thus, each detailed analysis carried out using the CSD system represents an experiment in data mining, and considerable operational and intellectual effort is employed in performing such analyses.

At the present state of development of the field, three facts are apparent: (a) many data-mining activities centre around a set of standard geometrical data types that are essential for major applications, particularly in structural chemistry, molecular modelling and rational drug design; (b) the expertise required to carry out data-mining experiments is not inconsiderable and the time required can be lengthy; and (c) as the size of the CSD is increasing rapidly and any compilations of structural knowledge should be updated on a regular basis, the increasing database size makes this operation very time consuming for individual users.

These considerations indicate that access to CSD information should be at two levels, the raw-data level and the structural-knowledge level, and, since 1995, the CCDC has started to derive libraries of structural knowledge from the raw data content of the CSD. The first of these libraries – IsoStar: a library of information on intermolecular interactions (Bruno et al., 1997[link]) – is briefly summarized below. A second library, Mogul, containing bond lengths, valence angles and torsional distributions, is currently under development in mid-2000. Such a knowledge base has obvious applications in crystallography, structural chemistry and molecular biology, not least in providing precise geometrical parameters which can be used in 3D model building, structure refinement and reality checking of developing and refined structures. The scientific applications of non-bonded contact geometries and conformational (torsional) information are more fully discussed in Chapter 22.4[link] .

References

First citation Bruno, I. J., Cole, J. C., Lommerse, J. P. M., Rowland, R. S., Taylor, R. & Verdonk, M. L. (1997). IsoStar: a library of information about nonbonded interactions. J. Comput.-Aided Mol. Des. 11, 525–537.Google Scholar








































to end of page
to top of page