International
Tables for
Crystallography
Volume G
Definition and exchange of crystallographic data
Edited by S. R. Hall and B. McMahon

International Tables for Crystallography (2006). Vol. G. ch. 3.2, pp. 106-108

Section 3.2.4.2. Chemical identification and connectivity information

S. R. Hall,a* P. M. D. Fitzgeraldb and B. McMahonc

a School of Biomedical and Chemical Sciences, University of Western Australia, Crawley, 6009, Australia,bMerck Research Laboratories, Rahway, New Jersey, USA, and cInternational Union of Crystallography, 5 Abbey Square, Chester CH1 2HU, England
Correspondence e-mail:  syd@crystal.uwa.edu.au

3.2.4.2. Chemical identification and connectivity information

| top | pdf |

The categories describing chemical identity and connectivity are as follows:

CHEMICAL group
Chemical identification (§3.2.4.2.1[link])
CHEMICAL
CHEMICAL_FORMULA
Chemical connectivity (§3.2.4.2.2[link])
CHEMICAL_CONN_ATOM
CHEMICAL_CONN_BOND

As indicated in Section 3.2.4.1.1[link], the chemical interpretation of the coordinate list of regions of significant electron density is not always easy. Occupational and compositional disorder, symmetry-equivalent locations, and unrefined atom sites all contribute to the difficulties, but it is usually possible in modern studies to construct a sensible chemical model. The CHEMICAL category group provides the data names needed to describe the chemical identity and properties of the material characterized in the structural study.

3.2.4.2.1. Chemical identification

| top | pdf |

The data items in these categories are as follows:

(a) CHEMICAL [Scheme scheme44]

(b) CHEMICAL_FORMULA [Scheme scheme45]

The CHEMICAL category itself deals with the large-scale chemical properties of the compound from which the crystal under study was formed: its various formal and common names, its source, melting point, decomposition and sublimation temperatures (as experimentally determined values, or as upper or lower possible values if not measured directly), its biological or physical properties, and where applicable the absolute configuration and optical rotation.

The optical rotation in solution may be reported using the data name _chemical_optical_rotation by an expression of the form[[\alpha]^T_W = \pm {100\alpha\over lc} \qquad(c=CONC, {\rm SOLV}),]where [[\alpha]^T_W] is the signed optical rotation in degrees at temperature T and wavelength labelled by code W, l is the length of the optical cell, CONC is the concentration of the solution (given as the mass of the substance in g in a standard 100 ml of solution), and SOLV is the chemical formula of the solvent. This can be marked up within the constraints of the ASCII character set to which CIF is restricted as [\a]^25^~D~ = +108 (c = 3.42, CHCl~3~), where the measurement is taken using the D line of the atomic spectrum of sodium.

Data items in the CHEMICAL_FORMULA category describe the chemical formula and formula mass of the compound under study. The quoted formula must reflect the overall stoichiometry of the crystal under study, and must, when multiplied by the Z value _cell_formula_units_Z, account for the total contents of the unit cell.

A number of data names are provided to account for different conventions in the presentation of chemical formulae. _chemical_formula_analytical is appropriate for a gross formula determined by standard chemical analysis, including all trace elements identified in the sample. Standard uncertainties on the proportions of elements present are acceptable, e.g.[Scheme scheme46] _chemical_formula_sum is another aggregate formula, in which all discrete bonded residues and ions are summed over the constituent elements. Where appropriate, the formulae of separate residues of a complex may be described by _chemical_formula_moiety, in which the formula for each moiety is supplied as a sum of the individual elements within the moiety, or by _chemical_formula_structural, in which sub-components within individual moieties are further identified, so that the overall expression permits the identification of particular bonded groups. Within these formula expressions, certain rules must be observed to allow parsing by software. The final data item relating to the chemical formula, _chemical_formula_iupac, is for formulae that are constructed according to the rules of the International Union for Pure and Applied Chemistry.

The ordering and notation rules are explained n detail in the dictionary, but are repeated here for convenience. Within each group of atoms for which a formula is present:

(i) only recognized element symbols may be used;

(ii) each element symbol is followed by a `count' number (`1' is implicit and may be omitted);

(iii) a space or parenthesis must separate each cluster of (element symbol + count);

(iv) where a group of elements is enclosed in parentheses, the multiplier for the group must follow the closing parentheses. That is, all element and group multipliers are assumed to be printed as subscripted numbers. (An exception to this rule exists for _chemical_formula_moiety, where pre- and post-multipliers are permitted for molecular units.)

(v) Unless the elements are ordered in a manner that corresponds to their chemical structure, as in _chemical_formula_structural, the order of the elements within any group or moiety depends on whether or not carbon is present. If carbon is present, the order should be: C, then H, then the other elements in alphabetical order of their symbol. If carbon is not present, the elements are listed purely in alphabetic order of their symbol. This is the `Hill' system used by Chemical Abstracts. This ordering is used in _chemical_formula_moiety and _chemical_formula_sum.

For _chemical_formula_moiety some additional rules apply:

(i) Moieties are separated by commas, `,'.

(ii) The order of elements within a moiety follows the general rules outlined above as the `Hill' system.

(iii) Parentheses are not used within moieties but may surround a moiety. Parentheses may not be nested.

(iv) Charges should be placed at the end of the moiety. The charge `+' or `-' may be preceded by a numerical multiplier and should be separated from the last (element symbol + count) by a space. Pre- or post-multipliers may be used for individual moieties.

Example 3.2.4.6[link] illustrates the differences between some of these data items.

Example 3.2.4.6. Different representations of a chemical formula.

[Scheme scheme49]

3.2.4.2.2. Chemical connectivity

| top | pdf |

The data items in these categories are as follows:

(a) CHEMICAL_CONN_ATOM [Scheme scheme47]

(b) CHEMICAL_CONN_BOND [Scheme scheme48]

The bullet ([\bullet]) indicates a category key. Where multiple items within a category are marked with a bullet, they must be taken together to form a compound key. The arrow ([\rightarrow]) is a reference to a parent data item.

The CHEMICAL_CONN_ATOM category labels the chemical atoms in a connected representation of the molecular species and can also give the coordinates for the atoms in a two-dimensional chemical diagram (Example 3.2.4.7[link]). Each atom may also carry an indication of the number of connected non-hydrogen atoms (*_NCA) and the number of hydrogen atoms (*_NH) to which it is connected. Together with the CHEMICAL_CONN_BOND category, the data items in the CHEMICAL_CONN_ATOM category provide a basic description of the chemical structure. Although the description of the chemical structure provided in these two categories is not as extensive as the information that may be conveyed in a molecular information file (Chapter 2.4[link] ), it should allow a substructure to be searched for in a suitable database.

Example 3.2.4.7. Representation of a two-dimensional chemical diagram.

[Scheme scheme50]

The CHEMICAL_CONN_BOND category lists pairs of atoms that contribute to chemical bonds and describes the nature of the bond between them (Example 3.2.4.8[link]). Taken with data items in the CHEMICAL_CONN_ATOM category, data items in this category complete the basic description of a molecular entity.

Example 3.2.4.8. Bond types in a chemical connectivity table.

[Scheme scheme51]

Bond types are assigned from a list that specifies single, double, triple, quadruple, aromatic, polymeric, delocalized double and π bonds. These are not intended to cover all possible cases, but to characterize a molecular model suitable for database substructure searching.








































to end of page
to top of page