International
Tables for
Crystallography
Volume F
Crystallography of biological macromolecules
Edited by M. G. Rossmann and E. Arnold

International Tables for Crystallography (2006). Vol. F. ch. 24.3, pp. 665-666

Section 24.3.3. The CSD software system

F. H. Allena* and V. J. Hoya

a Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge CB2 1EZ, England
Correspondence e-mail:  allen@ccdc.cam.ac.uk

24.3.3. The CSD software system

| top | pdf |

24.3.3.1. Overview

| top | pdf |

The CSD is supplied with a suite of fully interactive graphical software modules which provides users with facilities to: (a) interrogate all of the 1D, 2D and 3D information fields; (b) display entries graphically in a variety of styles; (c) retrieve relevant data for search hits, including geometrical parameters derived from the stored coordinates; and (d) display the derived numerical information, e.g. as histograms, scattergrams etc., generate descriptive statistics and perform more complex numerical analyses. More recently, software has been added that permits users to transform their own in-house structural data to CSD formats for inclusion in these processes. A summary of the overall CSD software system is given in Fig. 24.3.3.1[link] which shows the functional relationships between the four major applications programs.

[Figure 24.3.3.1]

Figure 24.3.3.1| top | pdf |

Summary of the software components of the Cambridge Structural Database system (CSDS).

24.3.3.2. PreQuest

| top | pdf |

PreQuest is a data-validation and data-conversion program which is used to create high-quality structural data files in CSD format from, e.g., raw input data from a CIF. PreQuest is used routinely by CCDC's scientific editors to create and validate entries for inclusion in the master CSD archive, hence the program is constantly being maintained and upgraded. The released version enables users to build a private CSD-format database of their own structures which can then be searched independently of, or in conjunction with, the master CSD files using the database access programs described below.

24.3.3.3. Searching the CSD: Quest3D and ConQuest

| top | pdf |

Quest 3D has been the main search engine and information-retrieval program for the CSD since the late 1980s. Its main features are summarized below. However, since 1997, the CCDC has been developing its successor, the ConQuest program, which was first released as part of the CSD system in April 2000. During an interim period, perhaps two years, ConQuest and Quest3D will both form part of the released CSD system on certain computing platforms while the functionality of the new program is being fully developed. Further details of ConQuest are provided in Section 24.3.3.5[link], indicating in particular how it differs from, and improves upon, the facilities available in Quest3D.

24.3.3.4. Quest3D

| top | pdf |

Quest 3D is the main search engine and information-retrieval program for the CSD. It permits interrogation of all information fields: (a) 19 text fields, (b) 38 individual numerical fields, (c) element symbols and element counts, (d) full or partial molecular formulae, (e) direct access to over 150 bit screens, (f) extensive 2D chemical substructure search capabilities, and (g) 3D substructure searching at the molecular level or at the extended crystal-structure level. A search of a specific information field is termed a test of that field, and is constructed graphically via the menu system; menu components correspond to the categories of searches identified above. A complete query is then constructed by combining a number of separate test components using Boolean logic.

Substructure searching is the most important and frequently used facility. At the molecular level, the substructure (chemical fragment) query is entered graphically and is defined using the formal covalent bond types present in the 2D chemical connectivity tables of the CSD. The process can be extended to locate non-bonded contacts in the complete crystal structure. Here, the individual atoms or chemical groups involved in the contact must be specified, and a limiting non-bonded contact distance must be provided, along with any other geometrical criteria required to define the contact more precisely.

All substructure searches begin with the user drawing the required chemical unit(s) via the BUILD menu. Chemical variability and precision are controlled through (a) the PERIODIC TABLE sub-menu, which allows for specification of variable element types at specific atomic sites, (b) the 2D-CONSTRAIN menu, which allows further chemical restrictions to be specified, such as cyclicity/acyclicity of bonds, exact hydrogen-atom counts, total coordination numbers for atoms etc., and (c) the 3D-CONSTRAIN menu, which permits the user to specify a list of geometrical parameters to be calculated by the program for each instance of the fragment located in the CSD; any of these geometrical parameters may be used as criteria to limit the scope of the search, especially at the intermolecular level. A file of calculated geometrical information is output by Quest3D and may be read by Vista, or by external data analysis software. Other Quest3D output files allow CSD search results to be communicated rapidly to proprietary modelling software.

24.3.3.5. ConQuest

| top | pdf |

The overall aim of the ConQuest project is to replace Quest3D with graphical search software that makes best use of modern computing environments. The primary objective has been to create an interface that is both simple and intuitive to use, so as to encourage use of the CSD by a broader spectrum of scientists. Thus, ConQuest provides: (a) text and numeric searches via pop-up windows, (b) a new sketcher window within which to encode 2D and 3D substructure searches, pharmacophore searches, and searches for non-bonded contacts in crystal structures, and (c) the immediate viewing of hits with facilities for backward and forward scrolling within hit lists. ConQuest is provided with full documentation and tutorials, both online and in printed form, and with context-dependent help facilities. Version 1.0, released in April 2000, contains most of the functionality available within the Quest3D program, and it is expected that Quest3D capabilities will soon be exceeded by the new program.

A most important feature of ConQuest is its availability on PC-Windows platforms, as well as its implementation under Unix/Linux. Initially, ConQuest and the CSD will be the only parts of the full CSD system available under PC-Windows, but Vista (or Vista-like facilities, see Section 24.3.3.6[link]), a new visualizer and provision of the CCDC's knowledge bases (IsoStar and Mogul) will follow as planned developments in the PC area.

24.3.3.6. Vista

| top | pdf |

Vista reads geometrical table(s) generated by Quest3D and provides extensive facilities for the graphical representation and statistical analysis of the numerical data. Graphical facilities include histograms and scattergrams referred to Cartesian or polar axes, with a hyperlink back to the original CSD entries to permit immediate investigation of, e.g., outlying observations. The contents of plots can be edited interactively, and all illustrations can be output in PostScript format for inclusion in reports and publications. Additionally, Vista will generate descriptive statistics for a distribution, carry out simple linear regressions and perform principal-component analyses.

24.3.3.7. Pluto

| top | pdf |

Pluto is used to visualize crystal and molecular structures in a variety of styles, including stick diagrams and ball-and-spoke and space-filling representations of individual molecules or extended crystal structures.

24.3.3.8. Use of the CSD software system: an example

| top | pdf |

The preceding sections can only give a flavour of the extensive search, analysis and visualization capabilities of Quest3D, ConQuest, Vista and Pluto, which are fully documented in manuals available online via the web address given below, or in printed form from the CCDC.

In this section, we illustrate the application of the CSD system to one specific example: a CSD-based analysis to examine the O—H···O hydrogen-bonding ability of the keto oxygen of Fig. 24.3.3.2[link]. This example illustrates a number of key features of the software system. The example is constructed in terms of Quest3D terminology, but identical facilities are available in the ConQuest program.

  • (1) Draw the two component substructures: the keto group and the O—H donor group. Constrain the total coordination number of C1, C3 (Fig. 24.3.3.2)[link] to be 4, thus defining them as C(sp3) atoms.

    [Figure 24.3.3.2]

    Figure 24.3.3.2| top | pdf |

    The keto···hydroxyl fragment described in the example of CSDS usage (see Section 24.3.3.8[link]), illustrating the parameters DOH, AH, THETA and PHI used to describe the hydrogen-bonded system.

  • (2) Define a non-bonded contact between keto O1 and hydroxy donor H1. Require that this contact (DOH) is less than 2.62 Å, the sum of van der Waals radii, after normalization of the H-atom position to correspond to a standard O—H bond length as determined by neutron diffraction [X-ray location of H atoms is imprecise – X—H distances are usually foreshortened – so the system will reposition H atoms along the X—H vector and at an X—H distance that corresponds to the mean value from neutron diffraction experiments (Allen et al., 1987[link])].

  • (3) Define the geometrical parameters shown in Fig. 24.3.3.2,[link] comprising the H···O distance (DOH), the O—H···O angle (AH), and the angles THETA and PHI that describe the angle of approach of H to the putative lone-pair plane of the keto oxygen atom. THETA is the angle of approach of the donor H atom to the plane of the keto group, PHI is the angle of rotation of the projection of the O···H vector in that plane; THETA = 0°, PHI = ±120° would correspond to H-atom approach along an O-atom lone-pair direction. The search is further constrained so that hits are only accepted if AH > 90°.

  • (4) At this stage, the 3D-CONSTRAIN menu will show a graphic which closely resembles Fig. 24.3.3.2[link]. Test 1 is now defined.

  • (5) Since there will be large numbers of examples of keto-O···H—O hydrogen bonds in the CSD, a secondary constraint based on the crystallographic R factor is applied so that examples are only located in the more precise structure determinations. To do this, we access the NUMERIC search menu to define RFACT < 0.075 as test 2.

  • (6) Enter the QUEST menu, which summarizes all current tests, select the organic structures only bit screen, and complete the full query by combining test 1 and test 2 via a Boolean .AND. operator.

Searches can be performed interactively or allowed to run to completion without further intervention from the user. In interactive mode, Quest3D presents each hit as it is located, as illustrated in Fig. 24.3.3.3,[link] and can then display the 1D bibliographic information, a 2D structural diagram, the 3D molecular structure, or a 3D packing diagram by toggling between display options. For an intermolecular search, as exemplified here, the non-bonded contact that triggered the hit is clearly identified. For the example described above, a file of the four user-defined geometrical parameters (DOH, AH, THETA, PHI) for each hit is created for use by Vista.

[Figure 24.3.3.3]

Figure 24.3.3.3| top | pdf |

A typical Quest3D graphics screen showing how search hits are visualized and manipulated.

Vista displays the geometrical parameters in the form of an interactive spreadsheet; the user may include or exclude specific substructures on the basis of numerical criteria during the data analysis, e.g. to focus on a specific range of DOH values, exclude outlying observations etc. Hyperlinking between Vista and the master CSD file means that all of the database information of Fig. 24.3.1.1[link] is immediately available during a Vista session, either by clicking on a particular fragment in the spreadsheet or on a particular data point in a histogram or scattergram. Use of Vista is illustrated for the >C=O···H—O example in Figs. 24.3.3.4,[link] 24.3.3.5[link] and 24.3.3.6[link].

[Figure 24.3.3.4]

Figure 24.3.3.4| top | pdf |

A Vista histogram of the hydrogen-bond distance, DOH, showing a sharp peak in the range 1.8–2.2 Å, well below the sum of van der Waals radii (2.62 Å). This peak can be isolated in Vista to obtain an estimate of the mean O···H separation in >C=O···H—O systems.

[Figure 24.3.3.5]

Figure 24.3.3.5| top | pdf |

A Vista scatterplot of the hydrogen-bond length (DOH) versus the O—H···O angle (AH). The plot shows a major clustering of observations having short DOH values and hydrogen-bond linearity (AH = 180°): stronger hydrogen bonds prefer to be linear.

[Figure 24.3.3.6]

Figure 24.3.3.6| top | pdf |

A Vista polar scatterplot of THETA versus PHI, the angles that define the direction of approach of the donor H atom to the >C=O plane. There are clear indications of lone-pair directionality: H prefers an in-plane approach to O (THETA = 0°), with preferred PHI values in the range 120–135°.

References

First citation Allen, F. H., Kennard, O., Watson, D. G., Brammer, L., Orpen, A. G. & Taylor, R. (1987). Tables of bond lengths determined by X-ray and neutron diffraction. Part 1. Bond lengths in organic compounds. J. Chem. Soc. Perkin Trans. 2, pp. S1–S19.Google Scholar








































to end of page
to top of page