Tables for
Volume F
Crystallography of biological macromolecules
Edited by M. G. Rossmann and E. Arnold

International Tables for Crystallography (2006). Vol. F, ch. 24.3, p. 663

Section 24.3.1. Introduction and historical perspective

F. H. Allena* and V. J. Hoya

aCambridge Crystallographic Data Centre, 12 Union Road, Cambridge CB2 1EZ, England
Correspondence e-mail:

24.3.1. Introduction and historical perspective

| top | pdf |

The Cambridge Structural Database (CSD: Allen et al., 1991[link]; Kennard & Allen, 1993[link]; ) is a fully retrospective computerized archive of bibliographic, chemical and numerical data from X-ray and neutron diffraction studies of small organic and metallo-organic molecules. Here, `small' means an upper limit of about 500 non-H atoms. The CSD was established in 1965, when the number of small-molecule crystal structures published each year was just a few hundred and, for this reason, it was possible to rapidly assimilate the earlier literature. However, the advent of increasingly powerful computers and associated advances in data collection and structure solution techniques has led to an almost exponential increase in the number of crystal structures being reported (Fig.[link].


Figure | top | pdf |

Growth of the CSD since 1970 expressed in terms of the number of structures published per annum.

Since the mid-1980s, there has been an average year-on-year increase in the number of CSD entries of very close to 10%. In 1999, around 18 000 structures were added to the database, and in mid-2000 the total archive contained over 220 000 structure determinations. This makes the CSD one of the largest numerical data resources currently available in chemistry (Table[link] At present, about 48% of the CSD comprises metallo-organic structures, 42% are pure organics, and the remaining 10% are compounds of the main-group elements. The doubling period of the CSD is approximately 7.5 years and, if account is taken of recent advances in diffractometer technology, it is possible to project a total of at least 500 000 database entries by the year 2010.

Table| top | pdf |
CSD statistics (August 2000)

No. of entries 224 945
No. of compounds 202 669
No. of entries with 3D coordinates 198 136
No. of entries with error-free coordinates 194 784
No. of atoms having 3D coordinates in the CSD 12 906 283
No. of entries in the CSD-Use database 792

In contrast with the Protein Data Bank (PDB: Bernstein et al., 1972[link]; Abola et al., 1997[link]; RCSB, 2000[link]), which has always received its data through direct electronic depositions, the CSD reflects the published literature. Until recently, much of the raw input has been re-keyboarded from hard-copy documents. Thus, in the early years, Cambridge Crystallographic Data Centre (CCDC) software development concentrated on data-validation techniques designed to eliminate keyboarding, typographical and scientific errors so as to ensure the accuracy of the master archive. Validation software has recently been upgraded to take advantage of modern computing methods, particularly the rapid developments in high-resolution graphics systems.

Nevertheless, the massive growth of the database has meant that the development of fast and efficient applications software for database search, data retrieval, numerical analysis and visual display has always been a high priority. The first of these software systems became available towards the end of the 1970s and constant updates ensure that the code continues to develop in response to user needs. The CSD system (CSDS), comprising the database and its applications software, is now distributed to about 1000 academic and industrial institutions worldwide.

Users of the CSD span the scientific spectrum, reflecting the wide range of research applications of the data it contains. Over the past two decades, the CSDS has provided the essential basis for research projects in structural chemistry, structure correlation and the rational design of novel bioactive molecules of pharmaceutical or agrochemical interest. A variety of statistical, numerical and computational methodologies have been applied to the CSD, giving rise to the concept of knowledge acquisition, or data mining, from the ever-increasing reservoir of precise experimental results. To date, nearly 700 papers of this type exist in the literature. This activity has, in turn, raised the possibility of the generation of knowledge-based libraries of structural information from the CSD. IsoStar (Bruno et al., 1997[link]), a library of information on non-bonded interactions which was first released in 1997, is the CCDC's first knowledge-based product. A companion knowledge base of intramolecular geometry, Mogul, is now under development, and will contain bond-length, valence-angle and torsional-angle information.


RCSB (2000). The Protein Data Bank. Research Collaboratory for Structural Bioinformatics, Department of Chemistry, Rutgers University, Piscataway, NJ, USA ( ).Google Scholar
Abola, E. E., Sussman, J. L., Prilusky, J. & Manning, N. O. (1997). Protein Data Bank archives of three-dimensional macromolecular structures. Methods Enzymol. 277, 556–571.Google Scholar
Allen, F. H., Davies, J. E., Galloy, J. J., Johnson, O., Kennard, O., Macrae, C. F., Mitchell, E. M., Mitchell, G. F., Smith, J. M. & Watson, D. G. (1991). The development of versions 3 and 4 of the Cambridge Structural Database system. J. Chem. Inf. Comput. Sci. 31, 187–204.Google Scholar
Bernstein, F. C., Koetzle, T. F., Williams, G. J. B., Meyer, E. F., Brice, M. D., Rodgers, J. R., Kennard, O., Shimanouchi, T. & Tasumi, M. (1972). The Protein Data Bank: a computer-based archival file for macromolecular structures. J. Mol. Biol. 112, 535–547.Google Scholar
Bruno, I. J., Cole, J. C., Lommerse, J. P. M., Rowland, R. S., Taylor, R. & Verdonk, M. L. (1997). IsoStar: a library of information about nonbonded interactions. J. Comput.-Aided Mol. Des. 11, 525–537.Google Scholar
Kennard, O. & Allen, F. H. (1993). 3D search and research using the Cambridge Structural Database. Chem. Des. Autom. News, 8, 1, 31–37.Google Scholar

to end of page
to top of page