The Cambridge Structural Database (CSD)

Allen, F. H.; Hoy, V. J.

doi:10.1107/97809553602060000720

International
Tables for
Crystallography
Volume F
Crystallography of biological macromolecules
Edited by M. G. Rossmann and E. Arnold

pdf | chapter contents | chapter index | related articles

International Tables for Crystallography (2006). Vol. F. ch. 24.3, pp. 663-668
https://doi.org/10.1107/97809553602060000720

Chapter 24.3. The Cambridge Structural Database (CSD)

F. H. Allen^a ^* and V. J. Hoy^a

^a Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge CB2 1EZ, England
Correspondence e-mail: allen@ccdc.cam.ac.uk

The Cambridge Structural Database System (CSDS), comprising the Cambridge Structural Database (CSD) and its associated software, as available in mid-2000 is described. The validated information content, comprehensive literature coverage and statistical data for the CSD are summarized. CSDS software for search, retrieval, analysis and display of database content is also reviewed, and typical research applications of the CSDS in structural chemistry and biology are briefly indicated. A new software component of the CSDS, IsoStar (a knowledge base of intermolecular interactions derived from the CSD and the PDB) is also summarized.

Keywords: ConQuest ; Pluto; PreQuest; Quest3D; Vista; CSD-Use database; Cambridge Structural Database; databases; intermolecular interactions; IsoStar.

24.3.1. Introduction and historical perspective

| top | pdf |

The Cambridge Structural Database (CSD: Allen et al., 1991 ; Kennard & Allen, 1993 ; http://www.ccdc.cam.ac.uk ) is a fully retrospective computerized archive of bibliographic, chemical and numerical data from X-ray and neutron diffraction studies of small organic and metallo-organic molecules. Here, `small' means an upper limit of about 500 non-H atoms. The CSD was established in 1965, when the number of small-molecule crystal structures published each year was just a few hundred and, for this reason, it was possible to rapidly assimilate the earlier literature. However, the advent of increasingly powerful computers and associated advances in data collection and structure solution techniques has led to an almost exponential increase in the number of crystal structures being reported (Fig. 24.3.1.1).

Figure 24.3.1.1| top | pdf |

Growth of the CSD since 1970 expressed in terms of the number of structures published per annum.

Since the mid-1980s, there has been an average year-on-year increase in the number of CSD entries of very close to 10%. In 1999, around 18 000 structures were added to the database, and in mid-2000 the total archive contained over 220 000 structure determinations. This makes the CSD one of the largest numerical data resources currently available in chemistry (Table 24.3.1.1). At present, about 48% of the CSD comprises metallo-organic structures, 42% are pure organics, and the remaining 10% are compounds of the main-group elements. The doubling period of the CSD is approximately 7.5 years and, if account is taken of recent advances in diffractometer technology, it is possible to project a total of at least 500 000 database entries by the year 2010.

Table 24.3.1.1| top | pdf |
CSD statistics (August 2000)

No. of entries	224 945
No. of compounds	202 669
No. of entries with 3D coordinates	198 136
No. of entries with error-free coordinates	194 784
No. of atoms having 3D coordinates in the CSD	12 906 283
No. of entries in the CSD-Use database	792

In contrast with the Protein Data Bank (PDB: Bernstein et al., 1972 ; Abola et al., 1997 ; RCSB, 2000 ), which has always received its data through direct electronic depositions, the CSD reflects the published literature. Until recently, much of the raw input has been re-keyboarded from hard-copy documents. Thus, in the early years, Cambridge Crystallographic Data Centre (CCDC) software development concentrated on data-validation techniques designed to eliminate keyboarding, typographical and scientific errors so as to ensure the accuracy of the master archive. Validation software has recently been upgraded to take advantage of modern computing methods, particularly the rapid developments in high-resolution graphics systems.

Nevertheless, the massive growth of the database has meant that the development of fast and efficient applications software for database search, data retrieval, numerical analysis and visual display has always been a high priority. The first of these software systems became available towards the end of the 1970s and constant updates ensure that the code continues to develop in response to user needs. The CSD system (CSDS), comprising the database and its applications software, is now distributed to about 1000 academic and industrial institutions worldwide.

Users of the CSD span the scientific spectrum, reflecting the wide range of research applications of the data it contains. Over the past two decades, the CSDS has provided the essential basis for research projects in structural chemistry, structure correlation and the rational design of novel bioactive molecules of pharmaceutical or agrochemical interest. A variety of statistical, numerical and computational methodologies have been applied to the CSD, giving rise to the concept of knowledge acquisition, or data mining, from the ever-increasing reservoir of precise experimental results. To date, nearly 700 papers of this type exist in the literature. This activity has, in turn, raised the possibility of the generation of knowledge-based libraries of structural information from the CSD. IsoStar (Bruno et al., 1997 ), a library of information on non-bonded interactions which was first released in 1997, is the CCDC's first knowledge-based product. A companion knowledge base of intramolecular geometry, Mogul, is now under development, and will contain bond-length, valence-angle and torsional-angle information.

24.3.2. Information content of the CSD

| top | pdf |

24.3.2.1. Acquisition of information

| top | pdf |

Almost all of the information contained in the CSD has been abstracted from the published literature. Over 800 primary literature sources are cited and the earliest reference is from 1930. Much of the data has been re-keyboarded from the original literature and from hard-copy supplementary deposition documents. The CSD now acts as the official depository for some 40 major international journals. Today, an increasing proportion (around 75% in mid-2000) of the numerical information is received directly in electronic form. The switch from hard-copy input to electronic deposition has been catalysed by the development of the exchange format for crystallographic data, the crystallographic information file or CIF (Hall et al., 1991 ). The CIF has been adopted as the standard for the subject by the International Union of Crystallography, and is now output by nearly all of the major software packages for structure determination and refinement. Development of the CIF has also led to an increase in direct private depositions of structural data to the CSD, data that, for various reasons, are unlikely to be published through formal mechanisms.

24.3.2.2. Data organization

| top | pdf |

Each individual structure in the CSD is referred to as an entry and each entry is identified by a reference code (refcode) containing six alphabetic characters, which characterize a specific chemical compound, and a further two numeric characters which trace the publication history of the structure. The information content of a typical CSD entry is illustrated schematically in Fig. 24.3.2.1. Individual data items can be categorized into three different groupings which are most conveniently described in terms of their dimensionality.

Figure 24.3.2.1| top | pdf |

Information content of the Cambridge Structural Database (CSD).

24.3.2.3. 1D bibliographic and chemical data

| top | pdf |

The one-dimensional data for each entry comprise chemical and bibliographic text strings, together with certain individual numerical items, viz chemical compound name and any common synonym(s), chemical formula, authors names, journal name and literature citation, text comment reflecting any special experimental details (non-room-temperature study, absolute configuration determined, neutron study etc.). The cell parameters, crystal data, space group and precision indicators also fall into this category.

24.3.2.4. 2D chemical connectivity data

| top | pdf |

The formal two-dimensional chemical structural diagram for each entry (Fig. 24.3.2.2) is encoded in the form of a compact connection table. Chemical connectivity is recorded in terms of a set of atom and bond properties. The atom properties recorded are: atom number, element type, number of connected non-H atoms, number of terminal H atoms and the formal atomic charge. Bond properties are encoded as a pair of atom numbers and the formal chemical bond type that connects those atoms. Bond types employed in the CSD connectivity descriptions are: single, double, triple, quadruple (metal–metal), aromatic, delocalized double and π bonds. Bond types are (automatically) coded negative if the bond forms part of a cyclic system.

Figure 24.3.2.2| top | pdf |

2D chemical connectivity data for a simple organic molecule.

24.3.2.5. 3D crystal structure data

| top | pdf |

The three-dimensional data consist of the fractional coordinates and symmetry operators for each entry. This information, together with the cell dimensions, is used to establish a crystallographic connectivity using standard covalent radii. The chemical and crystallographic connectivities are then mapped onto one another, using graph-theoretic algorithms, so that the chemical atom and bond properties are associated with the three-dimensional structure for search purposes. The CSD always records coordinates for complete molecules. Thus, if a molecule adopts a special position in the assigned space group, i.e. the asymmetric unit is some fraction of the total number of atoms in the molecule, then the CSD system also records those symmetry-generated atoms that complete the chemical entity. This speeds up the search process and also makes the data more accessible to non-crystallographers.

24.3.2.6. Derived data and bit-encoded information

| top | pdf |

Derived data are calculated directly from the evaluated raw data and stored in the master archive for search purposes. Numerical items such as Z′, the number of chemical entities in the asymmetric unit, is a typical (real) numerical data item in this category. However, by far the most useful of the derived data items are a set of 682 individual pieces of yes/no information which are encoded as a bitmap, referred to as the screen record. The first 155 of these bits record information about (a) the elemental constitution of the compound, (b) results of the data-validation procedure and (c) summary information about the data content of the entry. These bits can be accessed directly by the user as search keys. The most important parts of the bitmap contain codified yes/no information about the presence/absence of specific features in the complete 2D or 3D structures held in the CSD. When a chemical substructure is entered as a query, its constitution is analysed in the same way to produce a bitmap for the query. Logical comparison of the query bitmap with the bitmap stored for each full CSD entry is computationally rapid, and quickly eliminates those entries that do not contain the requested features. Only those entries that pass this initial screening process need enter the detailed and computationally intensive atom-by-atom, bond-by-bond connectivity mapping that finally confirms (or not) the presence of the required query substructure.

24.3.2.7. Data validation

| top | pdf |

All data entering the CSD are subject to stringent check and evaluation procedures. Some of these are visual, but the majority are automated within the CSD program PreQuest. The checks ensure that the 1D and 2D information fields abstracted by CCDC staff are accurately encoded, and that the 3D crystallographic coordinates are consistent with both the chemical description of the structure and with the geometrical description supplied by the authors. Most typographical errors in original papers can be corrected by the CCDC but, in the case of serious discrepancies, the original authors are consulted.

24.3.2.8. The CSD-Use database

| top | pdf |

CSD-Use is a database of scientific research papers in which the CSD was used as the principal or sole source of experimental information. The database comprises more than 700 literature citations classified according to the type of systematic study undertaken. Each CSD-Use entry also contains a short summary of the major findings of the research. The database is growing rapidly over time, and is expected to be a valuable resource in the future, since it contains a fully retrospective overview of the data-mining methods and research applications of the CSD.

24.3.3. The CSD software system

| top | pdf |

24.3.3.1. Overview

| top | pdf |

The CSD is supplied with a suite of fully interactive graphical software modules which provides users with facilities to: (a) interrogate all of the 1D, 2D and 3D information fields; (b) display entries graphically in a variety of styles; (c) retrieve relevant data for search hits, including geometrical parameters derived from the stored coordinates; and (d) display the derived numerical information, e.g. as histograms, scattergrams etc., generate descriptive statistics and perform more complex numerical analyses. More recently, software has been added that permits users to transform their own in-house structural data to CSD formats for inclusion in these processes. A summary of the overall CSD software system is given in Fig. 24.3.3.1 which shows the functional relationships between the four major applications programs.

Figure 24.3.3.1| top | pdf |

Summary of the software components of the Cambridge Structural Database system (CSDS).

24.3.3.2. PreQuest

| top | pdf |

PreQuest is a data-validation and data-conversion program which is used to create high-quality structural data files in CSD format from, e.g., raw input data from a CIF. PreQuest is used routinely by CCDC's scientific editors to create and validate entries for inclusion in the master CSD archive, hence the program is constantly being maintained and upgraded. The released version enables users to build a private CSD-format database of their own structures which can then be searched independently of, or in conjunction with, the master CSD files using the database access programs described below.

24.3.3.3. Searching the CSD: Quest3D and ConQuest

| top | pdf |

Quest 3D has been the main search engine and information-retrieval program for the CSD since the late 1980s. Its main features are summarized below. However, since 1997, the CCDC has been developing its successor, the ConQuest program, which was first released as part of the CSD system in April 2000. During an interim period, perhaps two years, ConQuest and Quest3D will both form part of the released CSD system on certain computing platforms while the functionality of the new program is being fully developed. Further details of ConQuest are provided in Section 24.3.3.5 , indicating in particular how it differs from, and improves upon, the facilities available in Quest3D.

24.3.3.4. Quest3D

| top | pdf |

Quest 3D is the main search engine and information-retrieval program for the CSD. It permits interrogation of all information fields: (a) 19 text fields, (b) 38 individual numerical fields, (c) element symbols and element counts, (d) full or partial molecular formulae, (e) direct access to over 150 bit screens, (f) extensive 2D chemical substructure search capabilities, and (g) 3D substructure searching at the molecular level or at the extended crystal-structure level. A search of a specific information field is termed a test of that field, and is constructed graphically via the menu system; menu components correspond to the categories of searches identified above. A complete query is then constructed by combining a number of separate test components using Boolean logic.

Substructure searching is the most important and frequently used facility. At the molecular level, the substructure (chemical fragment) query is entered graphically and is defined using the formal covalent bond types present in the 2D chemical connectivity tables of the CSD. The process can be extended to locate non-bonded contacts in the complete crystal structure. Here, the individual atoms or chemical groups involved in the contact must be specified, and a limiting non-bonded contact distance must be provided, along with any other geometrical criteria required to define the contact more precisely.

All substructure searches begin with the user drawing the required chemical unit(s) via the BUILD menu. Chemical variability and precision are controlled through (a) the PERIODIC TABLE sub-menu, which allows for specification of variable element types at specific atomic sites, (b) the 2D-CONSTRAIN menu, which allows further chemical restrictions to be specified, such as cyclicity/acyclicity of bonds, exact hydrogen-atom counts, total coordination numbers for atoms etc., and (c) the 3D-CONSTRAIN menu, which permits the user to specify a list of geometrical parameters to be calculated by the program for each instance of the fragment located in the CSD; any of these geometrical parameters may be used as criteria to limit the scope of the search, especially at the intermolecular level. A file of calculated geometrical information is output by Quest3D and may be read by Vista, or by external data analysis software. Other Quest3D output files allow CSD search results to be communicated rapidly to proprietary modelling software.

24.3.3.5. ConQuest

| top | pdf |

The overall aim of the ConQuest project is to replace Quest3D with graphical search software that makes best use of modern computing environments. The primary objective has been to create an interface that is both simple and intuitive to use, so as to encourage use of the CSD by a broader spectrum of scientists. Thus, ConQuest provides: (a) text and numeric searches via pop-up windows, (b) a new sketcher window within which to encode 2D and 3D substructure searches, pharmacophore searches, and searches for non-bonded contacts in crystal structures, and (c) the immediate viewing of hits with facilities for backward and forward scrolling within hit lists. ConQuest is provided with full documentation and tutorials, both online and in printed form, and with context-dependent help facilities. Version 1.0, released in April 2000, contains most of the functionality available within the Quest3D program, and it is expected that Quest3D capabilities will soon be exceeded by the new program.

A most important feature of ConQuest is its availability on PC-Windows platforms, as well as its implementation under Unix/Linux. Initially, ConQuest and the CSD will be the only parts of the full CSD system available under PC-Windows, but Vista (or Vista-like facilities, see Section 24.3.3.6 ), a new visualizer and provision of the CCDC's knowledge bases (IsoStar and Mogul) will follow as planned developments in the PC area.

24.3.3.6. Vista

| top | pdf |

Vista reads geometrical table(s) generated by Quest3D and provides extensive facilities for the graphical representation and statistical analysis of the numerical data. Graphical facilities include histograms and scattergrams referred to Cartesian or polar axes, with a hyperlink back to the original CSD entries to permit immediate investigation of, e.g., outlying observations. The contents of plots can be edited interactively, and all illustrations can be output in PostScript format for inclusion in reports and publications. Additionally, Vista will generate descriptive statistics for a distribution, carry out simple linear regressions and perform principal-component analyses.

24.3.3.7. Pluto

| top | pdf |

Pluto is used to visualize crystal and molecular structures in a variety of styles, including stick diagrams and ball-and-spoke and space-filling representations of individual molecules or extended crystal structures.

24.3.3.8. Use of the CSD software system: an example

| top | pdf |

The preceding sections can only give a flavour of the extensive search, analysis and visualization capabilities of Quest3D, ConQuest, Vista and Pluto, which are fully documented in manuals available online via the web address given below, or in printed form from the CCDC.

In this section, we illustrate the application of the CSD system to one specific example: a CSD-based analysis to examine the O—H···O hydrogen-bonding ability of the keto oxygen of Fig. 24.3.3.2 . This example illustrates a number of key features of the software system. The example is constructed in terms of Quest3D terminology, but identical facilities are available in the ConQuest program.

(1) Draw the two component substructures: the keto group and the O—H donor group. Constrain the total coordination number of C₁, C₃ (Fig. 24.3.3.2) to be 4, thus defining them as C(sp³) atoms.

Figure 24.3.3.2| top | pdf |

The keto···hydroxyl fragment described in the example of CSDS usage (see Section 24.3.3.8 ), illustrating the parameters DOH, AH, THETA and PHI used to describe the hydrogen-bonded system.

(2) Define a non-bonded contact between keto O₁ and hydroxy donor H₁. Require that this contact (DOH) is less than 2.62 Å, the sum of van der Waals radii, after normalization of the H-atom position to correspond to a standard O—H bond length as determined by neutron diffraction [X-ray location of H atoms is imprecise – X—H distances are usually foreshortened – so the system will reposition H atoms along the X—H vector and at an X—H distance that corresponds to the mean value from neutron diffraction experiments (Allen et al., 1987 )].
(3) Define the geometrical parameters shown in Fig. 24.3.3.2, comprising the H···O distance (DOH), the O—H···O angle (AH), and the angles THETA and PHI that describe the angle of approach of H to the putative lone-pair plane of the keto oxygen atom. THETA is the angle of approach of the donor H atom to the plane of the keto group, PHI is the angle of rotation of the projection of the O···H vector in that plane; THETA = 0°, PHI = ±120° would correspond to H-atom approach along an O-atom lone-pair direction. The search is further constrained so that hits are only accepted if AH > 90°.
(4) At this stage, the 3D-CONSTRAIN menu will show a graphic which closely resembles Fig. 24.3.3.2 . Test 1 is now defined.
(5) Since there will be large numbers of examples of keto-O···H—O hydrogen bonds in the CSD, a secondary constraint based on the crystallographic R factor is applied so that examples are only located in the more precise structure determinations. To do this, we access the NUMERIC search menu to define RFACT < 0.075 as test 2.
(6) Enter the QUEST menu, which summarizes all current tests, select the organic structures only bit screen, and complete the full query by combining test 1 and test 2 via a Boolean .AND. operator.

Searches can be performed interactively or allowed to run to completion without further intervention from the user. In interactive mode, Quest3D presents each hit as it is located, as illustrated in Fig. 24.3.3.3, and can then display the 1D bibliographic information, a 2D structural diagram, the 3D molecular structure, or a 3D packing diagram by toggling between display options. For an intermolecular search, as exemplified here, the non-bonded contact that triggered the hit is clearly identified. For the example described above, a file of the four user-defined geometrical parameters (DOH, AH, THETA, PHI) for each hit is created for use by Vista.

Figure 24.3.3.3| top | pdf |

A typical Quest3D graphics screen showing how search hits are visualized and manipulated.

Vista displays the geometrical parameters in the form of an interactive spreadsheet; the user may include or exclude specific substructures on the basis of numerical criteria during the data analysis, e.g. to focus on a specific range of DOH values, exclude outlying observations etc. Hyperlinking between Vista and the master CSD file means that all of the database information of Fig. 24.3.1.1 is immediately available during a Vista session, either by clicking on a particular fragment in the spreadsheet or on a particular data point in a histogram or scattergram. Use of Vista is illustrated for the >C=O···H—O example in Figs. 24.3.3.4, 24.3.3.5 and 24.3.3.6 .

Figure 24.3.3.4| top | pdf |

A Vista histogram of the hydrogen-bond distance, DOH, showing a sharp peak in the range 1.8–2.2 Å, well below the sum of van der Waals radii (2.62 Å). This peak can be isolated in Vista to obtain an estimate of the mean O···H separation in >C=O···H—O systems.

Figure 24.3.3.5| top | pdf |

A Vista scatterplot of the hydrogen-bond length (DOH) versus the O—H···O angle (AH). The plot shows a major clustering of observations having short DOH values and hydrogen-bond linearity (AH = 180°): stronger hydrogen bonds prefer to be linear.

Figure 24.3.3.6| top | pdf |

A Vista polar scatterplot of THETA versus PHI, the angles that define the direction of approach of the donor H atom to the >C=O plane. There are clear indications of lone-pair directionality: H prefers an in-plane approach to O (THETA = 0°), with preferred PHI values in the range 120–135°.

24.3.4. Knowledge engineering from the CSD

| top | pdf |

24.3.4.1. Databases versus knowledge bases

| top | pdf |

As illustrated in earlier sections, the CSD represents a collection of primary data resulting from diffraction experiments on crystals of small molecules – in particular the fractional coordinates, space group and cell dimensions that define the 3D crystal and molecular structure. However, the user of the system is usually interested in structural knowledge – in the form of bond lengths, angles, intermolecular contact distances and other parameters – that can be synthesized from the raw data by the use of the CSD system software. Thus, each detailed analysis carried out using the CSD system represents an experiment in data mining, and considerable operational and intellectual effort is employed in performing such analyses.

At the present state of development of the field, three facts are apparent: (a) many data-mining activities centre around a set of standard geometrical data types that are essential for major applications, particularly in structural chemistry, molecular modelling and rational drug design; (b) the expertise required to carry out data-mining experiments is not inconsiderable and the time required can be lengthy; and (c) as the size of the CSD is increasing rapidly and any compilations of structural knowledge should be updated on a regular basis, the increasing database size makes this operation very time consuming for individual users.

These considerations indicate that access to CSD information should be at two levels, the raw-data level and the structural-knowledge level, and, since 1995, the CCDC has started to derive libraries of structural knowledge from the raw data content of the CSD. The first of these libraries – IsoStar : a library of information on intermolecular interactions (Bruno et al., 1997 ) – is briefly summarized below. A second library, Mogul , containing bond lengths, valence angles and torsional distributions, is currently under development in mid-2000. Such a knowledge base has obvious applications in crystallography, structural chemistry and molecular biology, not least in providing precise geometrical parameters which can be used in 3D model building, structure refinement and reality checking of developing and refined structures. The scientific applications of non-bonded contact geometries and conformational (torsional) information are more fully discussed in Chapter 22.4 .

24.3.4.2. IsoStar: a library of knowledge about intermolecular interactions

| top | pdf |

IsoStar (Bruno et al., 1997 ) is based on experimental data, not only from the CSD but also from the PDB, and contains some theoretical results calculated using the ab initio intermolecular perturbation theory (IMPT) method of Hayes & Stone (1984 ). The experimental data in the CSD and the PDB have been used to display interaction geometries involving central groups (A) and contact groups (B). CSD search results of the type exemplified above are transformed into an easily visualized form by overlaying the A moieties. This results in a 3D distribution (scatterplot) showing the experimental distribution of B around A. A web-browser front end permits rapid access to these scatterplots, which can be viewed in RasMol (Sayle, 1996 ), interrogated interactively, converted into contoured surfaces etc.

Version 1.1 of IsoStar, released in October 1998, contains information on non-bonded interactions formed between 310 central groups and 45 contact groups. Version 1.1 contains over 12 000 scatterplots: 9 000 from the CSD and 3 000 from the PDB. IsoStar also reports results for 867 theoretical potential-energy minima calculated using the IMPT procedure. The library will be updated on a regular basis using automated software procedures developed at the CCDC.

Chapter 22.4 contains illustrative examples from IsoStar, together with a more complete description of the knowledge base and its applications.

24.3.5. Accessing the CSD system and IsoStar

| top | pdf |

24.3.5.1. Release mechanisms

| top | pdf |

The CSD system, comprising the CSD and CSD-Use databases, and all of the applications software described above, is available on CD-ROM for Unix and DEC-VMS platforms and for PCs operating under Linux. At the time of writing (mid-2000), ConQuest alone is available for PC-Windows platforms, but the full availability of other components of the CSD system is currently being addressed. The CSD is released twice yearly, in April and October, as an indexed sequential binary file, with full installation instructions contained within the CD. Versions of the CSD have been reformatted for use with proprietary software systems: the MACCS3D/Isis system from Molecular Design Limited, and the Sybyl-UNITY system from Tripos Associates.

Subscribers in academic and other not-for-profit institutions may obtain the CSD system through their local National Affiliated Centre (NAC). The names, addresses and other coordinates of these centres are contained in the CCDC's web pages (see below). Users in countries not covered by NAC arrangements, or users from for-profit companies and organizations, should contact the CCDC directly.

The IsoStar library, for Unix systems only, is released after each library update, currently planned to occur on an annual basis. IsoStar forms part of the distributed CSD system, and CDs are available through the same mechanisms as the main system.

24.3.5.2. Information about the CCDC

| top | pdf |

The CCDC maintains an extensive set of information on the web site http://www.ccdc.cam.ac.uk . The site describes the CSD system, IsoStar, and the associated research and development activities of the CCDC. These pages also provide access to CSD system documentation, provide lists of contact details for the National Affiliated Centres that service not-for-profit users worldwide, and give up-to-date information on how to contact the CCDC directly.

24.3.6. Conclusion

| top | pdf |

This chapter has provided an overview of the Cambridge Structural Database, its associated software system, other databases and IsoStar – the first library of structural knowledge to be derived from the CSD. This information is accurate at the time of writing (May 1998, but with some revision in mid-2000). However, the CSD itself, the knowledge bases derived from it and a wide variety of applications software are under continuous development and improvement, and articles such as this can only provide a snapshot of progress at any particular time. Readers of this article are therefore encouraged to visit the CCDC's website (http://www.ccdc.cam.ac.uk ) to obtain the latest information on available products and services.

References

Abola, E. E., Sussman, J. L., Prilusky, J. & Manning, N. O. (1997). Protein Data Bank archives of three-dimensional macromolecular structures. Methods Enzymol. 277, 556–571.Google Scholar

Allen, F. H., Davies, J. E., Galloy, J. J., Johnson, O., Kennard, O., Macrae, C. F., Mitchell, E. M., Mitchell, G. F., Smith, J. M. & Watson, D. G. (1991). The development of versions 3 and 4 of the Cambridge Structural Database system. J. Chem. Inf. Comput. Sci. 31, 187–204.Google Scholar

Allen, F. H., Kennard, O., Watson, D. G., Brammer, L., Orpen, A. G. & Taylor, R. (1987). Tables of bond lengths determined by X-ray and neutron diffraction. Part 1. Bond lengths in organic compounds. J. Chem. Soc. Perkin Trans. 2, pp. S1–S19.Google Scholar

Bernstein, F. C., Koetzle, T. F., Williams, G. J. B., Meyer, E. F., Brice, M. D., Rodgers, J. R., Kennard, O., Shimanouchi, T. & Tasumi, M. (1972). The Protein Data Bank: a computer-based archival file for macromolecular structures. J. Mol. Biol. 112, 535–547.Google Scholar

Bruno, I. J., Cole, J. C., Lommerse, J. P. M., Rowland, R. S., Taylor, R. & Verdonk, M. L. (1997). IsoStar: a library of information about nonbonded interactions. J. Comput.-Aided Mol. Des. 11, 525–537.Google Scholar

Hall, S. R., Allen, F. H. & Brown, I. D. (1991). The crystallographic information file (CIF): a new standard archive file for crystallography. Acta Cryst. A47, 655–685.Google Scholar

Hayes, I. C. & Stone, A. J. (1984). An intermolecular perturbation theory for the region of moderate overlap. J. Mol. Phys. 53, 83–105.Google Scholar

Kennard, O. & Allen, F. H. (1993). 3D search and research using the Cambridge Structural Database. Chem. Des. Autom. News, 8, 1, 31–37.Google Scholar

RCSB (2000). The Protein Data Bank. Research Collaboratory for Structural Bioinformatics, Department of Chemistry, Rutgers University, Piscataway, NJ, USA (http://www.rcsb.org ).Google Scholar

Sayle, R. (1996). The RASMOL visualiser. Glaxo Wellcome Research, Stevenage, Hertfordshire, England.Google Scholar

International Tables for Crystallography (2006). Vol. F. ch. 24.3, pp. 663-668
https://doi.org/10.1107/97809553602060000720