Atomicity, chemistry and structure

Hall, S. R.; Fitzgerald, P. M. D.; McMahon, B.

doi:10.1107/97809553602060000734

International
Tables for
Crystallography
Volume G
Definition and exchange of crystallographic data
Edited by S. R. Hall and B. McMahon

pdf | chapter contents | chapter index | related articles

International Tables for Crystallography (2006). Vol. G. ch. 3.2, pp. 102-112

Section 3.2.4. Atomicity, chemistry and structure

S. R. Hall,^a ^* P. M. D. Fitzgerald^b and B. McMahon^c

^a School of Biomedical and Chemical Sciences, University of Western Australia, Crawley, 6009, Australia,^bMerck Research Laboratories, Rahway, New Jersey, USA, and ^cInternational Union of Crystallography, 5 Abbey Square, Chester CH1 2HU, England
Correspondence e-mail: syd@crystal.uwa.edu.au

3.2.4. Atomicity, chemistry and structure

| top | pdf |

The core CIF dictionary provides many data names for describing the structural model.

The categories describing the atom sites handle these in a general way as sites of significant electron density which might be contributed to by more than one element species. The chemical identification of the compound under study, and where appropriate a model of the molecular connectivity and bonding, are handled separately by the chemistry-related categories. The geometry-related categories are purely derivative, given knowledge of the positions of the atom sites and the crystallographic symmetry; but as with other examples of derived data, they are given their own data names to provide convenient listings and to check the consistency of information provided by other categories. The symmetry-related data names in the core dictionary are restricted to those essential for the construction of a geometric model; Chapter 3.8 describes a symmetry extension dictionary suitable for a more complete description of crystal symmetry.

3.2.4.1. Atom sites

| top | pdf |

The categories describing atom sites are as follows:

ATOM group

Individual atom sites (§3.2.4.1.1

)

ATOM_SITE

Collections of atom sites (§3.2.4.1.2

)

ATOM_SITES

Atom types (§3.2.4.1.3

)

ATOM_TYPE

These categories permit the traditional interpretation of regular concentrations of electron density in a crystalline lattice as atom sites containing one or more chemical elements, with complete or partial occupancy, and with a spatial distribution affected by thermal displacement or disorder.

Lists of atom-site coordinates and anisotropic displacement factors are covered by data items in the ATOM_SITE category. Identification of the chemical species occupying each site is handled by data items in the ATOM_TYPE category and data items in the ATOM_SITES category record collective information common to all sites.

While the ATOM_SITE category formally contains the data items describing both positions and atomic displacements, the anisotropic displacement parameters are often given in a separate looped list. In the version of the core dictionary embedded in the macromolecular CIF dictionary, which uses the DDL2 formalism, this is recognized by the creation of a separate, but overlapping, ATOM_SITE_ANISOTROP category.

3.2.4.1.1. Individual atom sites

| top | pdf |

The data items in this category are as follows:

ATOM_SITE [Scheme scheme36]

The bullet ( $[\bullet]$ ) indicates a category key. For this category an alternative category key can be formed by taking all the_atom_site_label_component_* items together. Anisotropic displacement parameters may also be listed in a separate loop, for which _atom_site_aniso_label forms the key. The arrow ( $[\rightarrow]$ ) is a reference to a parent data item. The dagger ( $[\dagger]$ ) indicates a deprecated item, which should not be used in the creation of new CIFs.

Data items in the ATOM_SITE category represent the positions of atom sites identified in the structural model, their spatial distribution defined by isotropic or anisotropic displacement parameters, details of restraints or constraints applied during the refinement, and the interpretation of their occupancy due to structural or compositional disorder.

Example 3.2.4.1 is a typical extract from a list of atom-site coordinates, with equivalent isotropic displacement values and refinement conditions. Each site is identified by _atom_site_label.

Example 3.2.4.1. List of atom-site coordinates, equivalent isotropic U values and refinement conditions.

[Scheme scheme37]

The coordinates are specified as fractional x, y, z values along the unit-cell axes. Coordinates may also be specified in ångström units along orthogonal Cartesian axes using the data names _atom_site_Cartn_x, _atom_site_Cartn_y and _atom_site_Cartn_z. The transformation matrix between Cartesian and fractional coordinates can be given in the ATOM_SITES category.

(Note that occupancy values are unaffected by symmetry. This is discussed later in connection with site multiplicity.)

_atom_site_U_iso_or_equiv records the isotropic atomic displacement value U_iso in the case of isotropic refinement. In the case of anisotropic refinement, _atom_site_U_iso_or_equiv records the equivalent isotropic value U_eq, defined as $[U_{\rm eq} = (1/3)\textstyle\sum_i\big[\sum_j (U^{ij}a_i^*a_j^*a_ia_j)\big],]$ where [a_i] are the real-space cell lengths, [a^*_j] are the reciprocal-space cell lengths and $[U^{ij}]$ are the anisotropic displacement parameters.

The data item _atom_site_adp_type identifies which value is given. An alternative equivalent isotropic displacement parameter _atom_site_U_equiv_geom_mean may be calculated as the geometric mean of the anisotropic parameters, $[U_{\rm eq}=(U_iU_jU_k)^{1/3}, ]$ where the [U_i] are the principal components of the orthogonalized $[U^{ij}]$ .

Data names also exist for the corresponding quantities calculated from B values, although the use of B values is discouraged by the IUCr Commission on Crystallographic Nomenclature.

For each site, _atom_site_calc_flag takes one of the following values: d, to indicate that the atom-site coordinates were determined from the diffraction intensities; c or calc to indicate that they were calculated from molecular geometry considerations; or dum, for a dummy site.

Specific restraints or constraints applied to a site may be indicated by one or more of the _atom_site_refinement_flags_* items.

The data item _atom_site_occupancy defines the fraction of the atom type present at the site. Note that the same site may occur more than once in the list, identified by separate values of _atom_site_label. Such an arrangement would represent contributions from separate atom types (perhaps in modelling compositional disorder). The sum of occupancies of all atom types present at a single site may not significantly exceed 1.0 (unless it is a dummy site with no physical significance). Note that an atom of a given chemical species positioned on a special position (e.g. on a twofold axis) will in general be assigned a full occupancy value of 1.0. However, it will occur less often in the unit cell than an atom on a general position (in this example by a factor of 2). To account for this in structure-factor calculations it may be given a population value of 0.5 within the refinement program. A population adjustment of this kind is not implied in the assignment of a value to _atom_site_occupancy. The multiplicity of the site owing to the space-group symmetry can be recorded in _atom_site_symmetry_multiplicity.

The disorder-related data names in this example will be discussed below.

_atom_site_type_symbol is a code which must match an entry in the ATOM_TYPE category that supplies information about the elemental composition and scattering factors of the atom or atoms occupying the site. Note that it is quite legitimate to have an atom-type symbol such as `Fe3+Ni2+', referring to a mixed-composition atom site. The effective physical properties of such a pseudo-atom should be given in full in the ATOM_TYPE category.

Example 3.2.4.2 demonstrates how the anisotropic displacement parameters are conventionally broken out into a separate list. When this is done, each atom site is identified by _atom_site_aniso_label, and this must of course match the value of _atom_site_label specifying the position of the site.

Example 3.2.4.2. Separate list of anisotropic U values with `_atom_site_aniso_label` acting as the key that uniquely identifies table rows in this listing.

[Scheme scheme38]

The data item _atom_site_label is normally used as the identifier of each individual atom site in a list of coordinates and atomic displacement factors. Historically, the labels given to atom sites have been chosen to summarize useful information about the atom located at the site. Almost invariably the label contains the symbol of the chemical element or elements occupying the site; there may also be indicators of charge, valence, chemical connectivity, disorder, occupation of a site of crystallographic symmetry or grouping within a component of secondary structure within large molecules. In a CIF, it is formally sufficient that atom-site labels are unique, as all the information about composition, valence, connectivity and so on can be extracted from the data items designed specifically to record this information. However, it is preferable that an atom-site label should summarize the relevant features of the site. Many styles and conventions for labelling atoms are in use in crystallography, so to enable interchange with other crystallographic data file formats, the core dictionary contains a detailed but highly flexible set of rules for constructing and parsing atom-site labels.

Labelling atom sites in crystallography usually serves two distinct purposes: (a) to identify the site in the molecule and crystal, and (b) to identify the chemical element that occupies that site. The core dictionary makes this distinction clear by defining ATOM_SITE and ATOM_TYPE as separate data categories. The connection between the two categories is made through the equivalence of the data items _atom_site_type_symbol (in the ATOM_SITE list) and _atom_type_symbol (in the ATOM_TYPE list). Often, however, crystallographers use a single label _atom_site_label to define both the site and the chemical species occupying it.

The _atom_site_label may be composed of as many as eight separate components; the recommended convention for construction of the string is as follows.

Component 0 [optionally identical to a value of _atom_type_symbol] (mandatory): A character string containing any character except a blank or an underline, with the proviso that each digit `0'–`9' is used only to designate an oxidation state and, as such, must be followed by a plus `+' or a minus `-' character. It is recommended that the element symbols be used when applicable. Examples of permissible codes are: Cu, Cu2+, dummy, Fe3+Ni2+, S-, H*, H(SDS).

Component 1 [atom number code] (optional): This string may contain any alphanumeric character except a blank or an underline, but the first character must be a digit `0'–`9' and the second character may not be a plus `+' or a minus `-'. Component 1 is intended primarily to differentiate sites containing the same atom type, but it can be used for any purpose. Examples of combined component 0 and 1 codes are: C1, C103g28, Fe3+17b, H*251, boron2a, Ni2+2, Fe2+Ni2+2, where component 0 is in bold to indicate how these labels are parsed.

Component 2 [residue code] (optional): This string may contain any character except a blank or underline. It is intended primarily to give specific structural information such as the molecular fragment or amino-acid type, e.g. C1_gly, O1_SO4. If component 2 is present, it is separated from the concatenated components 0 and 1 with an underline character.

Components 3–7 [sequence, remoteness, chain order, alternate, footnote codes] (optional): These strings may contain any character except a blank or an underline. The underline character is used to separate the individual components. The names associated with the separate components suggest their roles in constructing composite labels that match the conventions of site labelling in the PDB format for macromolecular structure files. However, they are not restricted to these functions and may be used in other ways.

Component 0 is normally identical to an _atom_type_symbol code in the ATOM_TYPE list. However, if it is not, an _atom_site_type_symbol code must appear in the ATOM_TYPE list in order to identify the atom type. In these cases, component 0 may contain any code consistent with the rules given in the dictionary. Thus, component 0 could be Ca to identify an alpha carbon, provided that the _atom_site_type_symbol is encoded as C to indicate that the atom type is carbon.

Multiple occupation of a single atom site by different atom species (compositional disorder) may be handled simply by having multiple values of _atom_site_label referring to the same site in the crystal structure. Alternatively, multiple occupancy of an atom site may be denoted by a unique character or characters in component 0 of the atom label, with the ATOM_TYPE list containing the equivalent pseudo element label entry with values that are weighted averages of those for the constituent elements. The proportions of the atom types should then be defined using _atom_type_description.

This _atom_site_label construction is flexible, visually decipherable and well suited to computer applications. The components can be easily identified and stripped with a single pass, from left to right, along the label string. Note that the underline separators are only used if higher-order components exist. If intermediate components are not used they may be omitted provided the underline separators are retained. For example, the label C233_ _ggg is acceptable and contains the components 0: C, 1: 233, 2: null and 3: ggg. There is no requirement that the same number of components should be used in each label.

The _atom_site_label may be replaced by separate data items specifying the individual components of an atom label; this may be useful for large lists of site coordinates, for example in a macromolecular structure, where site-labelling components follow a systematic convention and where subsets of the atom sites need to be searched for or extracted using individual label components. Such uses are not common in files built with core CIF data names; the mmCIF dictionary identifies substructural components in biological macromolecules by alternative techniques (Section 3.6.7 ).

There is no comparable fragmentation of the components of _atom_site_aniso_label. Where separate lists of anisotropic displacement parameters use complex atom-site labels, either the coordinate list should use _atom_site_label alone or the processing software needs to be able to construct a value for _atom_site_label from the separate components _atom_site_label_component_* in order to test the equivalence between the labels in the coordinates and anisotropic displacement parameters lists.

While either atom-labelling technique is permitted, it is recommended that the individual label components are not used unless there is an overwhelming argument to do so.

Information about the molecular model is sometimes embedded in a labelling convention. In CIF, this information is usually expressed through other data items.

The connectivity of a molecule is described by the CHEMICAL group of categories, and more specifically through the CHEMICAL_CONN_ATOM and CHEMICAL_CONN_BOND categories.

The link between atom sites in the coordinate list and the corresponding atoms in the molecular model is established using the data item _chemical_conn_atom_number in the CHEM_CONN_ATOM category, and the data items _chemical_conn_bond_atom_1 and _chemical_conn_bond_atom_2 in the CHEMICAL_CONN_BOND category. The values of these data items must match values for the data item _atom_site_chemical_conn_number in the ATOM_SITE list. Example 3.2.4.3 shows an extract from a connectivity table; a more complete version of this table is given in the relevant category descriptions in the dictionary.

Example 3.2.4.3. Chemical connectivity table; atoms are linked back to atom-site positions through matching values of `_atom_site_chemical_conn_number` and `_chemical_conn_atom_number`.

[Scheme scheme39]

Note that there is no guarantee that the refined atom-site coordinates that characterize the asymmetric unit will correspond to locations within a single connected molecular species. Crystal symmetry transformations may need to be applied to individual sites in order to map the contents of a connected molecular residue to real space in the unit cell. There is no provision in the CHEMICAL_CONN categories for the specification of these symmetry transformations; thus these higher-order molecular geometries are best described using data items in the GEOM categories, which do allow for the specification of symmetry transformations.

It may also be the case that not all atom positions have been located; this is particularly true for hydrogen atoms, and the data item _atom_site_attached_hydrogens is provided for book-keeping purposes to indicate hydrogen atoms known to be bonded to an atom but whose positions have not been refined (or calculated).

Example 3.2.4.4 shows how the disorder of a group of bonded atoms over a set of atom sites (occupational disorder) is described. In this example of a disordered tetrafluoroborate anion, the data item _atom_site_disorder_assembly takes the value A, and the data item _atom_site_disorder_group takes the values 1 and 2, indicating the two alternative positions of the disordered group.

Example 3.2.4.4. Handling of occupational disorder of atom sites.

[Scheme scheme40]

The remaining items in this category are clearly described in their individual dictionary entries. However, the now-deprecated data item _atom_site_refinement_flags should be mentioned. This was allowed to take values obtained by concatenating one or more of the single-letter flags:

. no refinement constraints;

S special-position constraint on site;

G rigid-group refinement of site;

R riding-atom site attached to non-riding atom;

D distance or angle restraint on site;

T thermal displacement constraints;

U U_iso or U^ij restraint (rigid bond);

P partial occupancy constraint.

These individual flags are listed in the dictionary using the DDL field _enumeration, which denotes a list of mutually exclusive permitted values. As concatenation of values is allowed here, dictionary-based software must be modified to handle this data item as a special case. To avoid the need for this in future, the data item was marked as deprecated from version 2.3 of the dictionary, and is replaced by the three separate items _atom_site_refinement_flags_posn, *_adp and *_occupancy. For each of these, the relevant combinations of refinement flags are fully enumerated (for example _atom_site_refinement_flags_adp may take any one of the values T, U or TU). This logically separates the different types of refinement constraints or restraints that an author might want to record and allows software to parse the data item.

3.2.4.1.2. Collections of atom sites

| top | pdf |

The data items in this category are as follows:

ATOM_SITES [Scheme scheme41]

This category records information that applies collectively to the atom sites of the structural model. At present, the topics covered are the transformation matrix between Cartesian and cell fractional coordinates, and the methods used to locate the initial atom sites. _atom_sites_solution_primary describes how the first atom sites were determined, _atom_sites_solution_secondary describes how the remaining non-hydrogen sites were located and _atom_sites_solution_hydrogens describes how hydrogen atoms were located. The codes that are allowed for each of these refer to distinct solution methods, and at present only the seven formal values listed below are provided (although other values might be added in the future):

difmap difference-electron-density map;

vecmap real-space vector search;

heavy heavy-atom method;

direct structure-invariant direct methods;

geom inferred from neighbouring sites;

disper anomalous-dispersion techniques;

isomor isomorphous structure methods.

3.2.4.1.3. Atom types

| top | pdf |

The data items in this category are as follows:

ATOM_TYPE [Scheme scheme42]

The bullet ( $[\bullet]$ ) indicates a category key.

The data items in this category record details about the atomic species associated with each occupied atom site in the structural model. While these will typically be standard properties of the naturally occurring chemical elements, they may also be synthetic atom types, for example in cases where a single atom site may be occupied with partial occupancies by atoms of different elements.

As mentioned in Section 3.2.4.1.1, there are two ways of dealing with such a case: the same location in the coordinate list may be populated by multiple entries, each for an atom of a particular element with an associated occupancy fraction; or a single entry may be made for the synthetic atom type, the properties of which are described fully in the ATOM_TYPE list.

Each different atom type has a unique _atom_type_symbol identifier. In principle, this could be any string of characters, but the dictionary recommends certain conventions to encourage compatibility with the atom-site labelling rules. It is recommended that the identifier be the normal chemical element symbol when the atom type is a pure element. If some other labelling is used, the identifier may be composed of any character except an underline, with the additional proviso that digits designate an oxidation state and must be followed by a `+' or `-' character.

The data item _atom_type_scat_versus_stol_list can be used to give a table of scattering factors as a function of $[(\sin\theta)/\lambda]$ . This is a text field with no specified internal structure, except the suggestion that it is well commented and the lists should be regularly formatted. However, it is generally enough to list the atomic scattering factors of each element and to provide a reference to the source of the values, as in Example 3.2.4.5.

Example 3.2.4.5. Reference to atomic scattering factors.

[Scheme scheme43]

3.2.4.2. Chemical identification and connectivity information

| top | pdf |

The categories describing chemical identity and connectivity are as follows:

CHEMICAL group

Chemical identification (§3.2.4.2.1

)

CHEMICAL

CHEMICAL_FORMULA

Chemical connectivity (§3.2.4.2.2

)

CHEMICAL_CONN_ATOM

CHEMICAL_CONN_BOND

As indicated in Section 3.2.4.1.1, the chemical interpretation of the coordinate list of regions of significant electron density is not always easy. Occupational and compositional disorder, symmetry-equivalent locations, and unrefined atom sites all contribute to the difficulties, but it is usually possible in modern studies to construct a sensible chemical model. The CHEMICAL category group provides the data names needed to describe the chemical identity and properties of the material characterized in the structural study.

3.2.4.2.1. Chemical identification

| top | pdf |

The data items in these categories are as follows:

(a) CHEMICAL [Scheme scheme44]

(b) CHEMICAL_FORMULA [Scheme scheme45]

The CHEMICAL category itself deals with the large-scale chemical properties of the compound from which the crystal under study was formed: its various formal and common names, its source, melting point, decomposition and sublimation temperatures (as experimentally determined values, or as upper or lower possible values if not measured directly), its biological or physical properties, and where applicable the absolute configuration and optical rotation.

The optical rotation in solution may be reported using the data name _chemical_optical_rotation by an expression of the form $[[\alpha]^T_W = \pm {100\alpha\over lc} \qquad(c=CONC, {\rm SOLV}),]$ where $[[\alpha]^T_W]$ is the signed optical rotation in degrees at temperature T and wavelength labelled by code W, l is the length of the optical cell, CONC is the concentration of the solution (given as the mass of the substance in g in a standard 100 ml of solution), and SOLV is the chemical formula of the solvent. This can be marked up within the constraints of the ASCII character set to which CIF is restricted as [\a]^25^~D~ = +108 (c = 3.42, CHCl~3~), where the measurement is taken using the D line of the atomic spectrum of sodium.

Data items in the CHEMICAL_FORMULA category describe the chemical formula and formula mass of the compound under study. The quoted formula must reflect the overall stoichiometry of the crystal under study, and must, when multiplied by the Z value _cell_formula_units_Z, account for the total contents of the unit cell.

A number of data names are provided to account for different conventions in the presentation of chemical formulae. _chemical_formula_analytical is appropriate for a gross formula determined by standard chemical analysis, including all trace elements identified in the sample. Standard uncertainties on the proportions of elements present are acceptable, e.g. [Scheme scheme46] _chemical_formula_sum is another aggregate formula, in which all discrete bonded residues and ions are summed over the constituent elements. Where appropriate, the formulae of separate residues of a complex may be described by _chemical_formula_moiety, in which the formula for each moiety is supplied as a sum of the individual elements within the moiety, or by _chemical_formula_structural, in which sub-components within individual moieties are further identified, so that the overall expression permits the identification of particular bonded groups. Within these formula expressions, certain rules must be observed to allow parsing by software. The final data item relating to the chemical formula, _chemical_formula_iupac, is for formulae that are constructed according to the rules of the International Union for Pure and Applied Chemistry.

The ordering and notation rules are explained n detail in the dictionary, but are repeated here for convenience. Within each group of atoms for which a formula is present:

(i) only recognized element symbols may be used;

(ii) each element symbol is followed by a `count' number (`1' is implicit and may be omitted);

(iii) a space or parenthesis must separate each cluster of (element symbol + count);

(iv) where a group of elements is enclosed in parentheses, the multiplier for the group must follow the closing parentheses. That is, all element and group multipliers are assumed to be printed as subscripted numbers. (An exception to this rule exists for _chemical_formula_moiety, where pre- and post-multipliers are permitted for molecular units.)

(v) Unless the elements are ordered in a manner that corresponds to their chemical structure, as in _chemical_formula_structural, the order of the elements within any group or moiety depends on whether or not carbon is present. If carbon is present, the order should be: C, then H, then the other elements in alphabetical order of their symbol. If carbon is not present, the elements are listed purely in alphabetic order of their symbol. This is the `Hill' system used by Chemical Abstracts. This ordering is used in _chemical_formula_moiety and _chemical_formula_sum.

For _chemical_formula_moiety some additional rules apply:

(i) Moieties are separated by commas, `,'.

(ii) The order of elements within a moiety follows the general rules outlined above as the `Hill' system.

(iii) Parentheses are not used within moieties but may surround a moiety. Parentheses may not be nested.

(iv) Charges should be placed at the end of the moiety. The charge `+' or `-' may be preceded by a numerical multiplier and should be separated from the last (element symbol + count) by a space. Pre- or post-multipliers may be used for individual moieties.

Example 3.2.4.6 illustrates the differences between some of these data items.

Example 3.2.4.6. Different representations of a chemical formula.

[Scheme scheme49]

3.2.4.2.2. Chemical connectivity

| top | pdf |

The data items in these categories are as follows:

(a) CHEMICAL_CONN_ATOM [Scheme scheme47]

(b) CHEMICAL_CONN_BOND [Scheme scheme48]

The bullet ( $[\bullet]$ ) indicates a category key. Where multiple items within a category are marked with a bullet, they must be taken together to form a compound key. The arrow ( $[\rightarrow]$ ) is a reference to a parent data item.

The CHEMICAL_CONN_ATOM category labels the chemical atoms in a connected representation of the molecular species and can also give the coordinates for the atoms in a two-dimensional chemical diagram (Example 3.2.4.7). Each atom may also carry an indication of the number of connected non-hydrogen atoms (*_NCA) and the number of hydrogen atoms (*_NH) to which it is connected. Together with the CHEMICAL_CONN_BOND category, the data items in the CHEMICAL_CONN_ATOM category provide a basic description of the chemical structure. Although the description of the chemical structure provided in these two categories is not as extensive as the information that may be conveyed in a molecular information file (Chapter 2.4 ), it should allow a substructure to be searched for in a suitable database.

Example 3.2.4.7. Representation of a two-dimensional chemical diagram.

[Scheme scheme50]

The CHEMICAL_CONN_BOND category lists pairs of atoms that contribute to chemical bonds and describes the nature of the bond between them (Example 3.2.4.8). Taken with data items in the CHEMICAL_CONN_ATOM category, data items in this category complete the basic description of a molecular entity.

Example 3.2.4.8. Bond types in a chemical connectivity table.

[Scheme scheme51]

Bond types are assigned from a list that specifies single, double, triple, quadruple, aromatic, polymeric, delocalized double and π bonds. These are not intended to cover all possible cases, but to characterize a molecular model suitable for database substructure searching.

3.2.4.3. Molecular or packing geometry

| top | pdf |

The categories describing geometry are as follows:

GEOM group

GEOM

GEOM_ANGLE

GEOM_BOND

GEOM_CONTACT

GEOM_HBOND

GEOM_TORSION

The molecular and packing geometry can be calculated fully given the unit-cell parameters, the space group and a list of atom sites. Therefore, all the information about geometry in the GEOM category group is derivative. However, it is useful to record it within the file both as a check on the primary information stored in other categories and as a method for flagging values to be published.

3.2.4.3.1. Contents of the geometry-related categories

| top | pdf |

The data items in these categories are as follows:

(a) GEOM [Scheme scheme52]

(b) GEOM_ANGLE [Scheme scheme53]

(d) GEOM_CONTACT [Scheme scheme55]

(e) GEOM_HBOND [Scheme scheme56]

(f) GEOM_TORSION [Scheme scheme57]

The bullet ( $[\bullet]$ ) indicates a category key. Where multiple items within a category are marked with a bullet, they must be taken together to form a compound key. *_symmetry_* items have a default value and may be omitted from the list. The arrow ( $[\rightarrow]$ ) is a reference to a parent data item.

Most categories within this group record distances or angles specified by atom-site labels and are well characterized. The GEOM category currently provides the single data name _geom_special_details in which any other details of the geometry that an author considers noteworthy may be stored. Examples of information that might be stored in this data item are least-squares equations of planes, out-of-plane distances, dihedral angles between planes and general comments about the calculation of standard uncertainties.

A subtlety in the geometry-related categories arises from the need to record geometric relationships that involve atoms that are not listed in the ATOM_SITE coordinate list, but that can be derived from the coordinates in this list by the application of a crystallographic symmetry transformation. Thus atom sites in the geometry lists are identified both by their atom-site labels (which must identically match one of the entries in the ATOM_SITE list) and by the code for the symmetry transformation that has been applied to the initial location. Since the atom-site labels may refer to atoms in their original location as well as to atoms in symmetry-related locations, the formal key for these categories involves the site labels as well as the symmetry codes. However, in many cases (as discussed further below) the symmetry codes may be absent from a list, and a parser must supply suitable default or null values for the missing components when constructing or checking a complete key.

In many cases, interest is focused on intramolecular distances and angles, and on intramolecular contacts within a single asymmetric unit. In such cases, the geometry lists would contain only atoms listed explicitly in the ATOM_SITE list and the symmetry codes all refer trivially to the identity transformation.

The examples in this section demonstrate various ways of handling geometry lists with trivial or non-trivial symmetry transformations. In Example 3.2.4.9, showing treatment of bond angles, the relevant data items (_geom_angle_site_symmetry_*) are absent, which is one method for indicating the identity transformation. Dictionary validation software must therefore be able to handle both the presence and absence of these components of the formal category key.

Example 3.2.4.9. List of bond angles.

[Scheme scheme58]

The symmetry transformations in this and related categories take the form of codes 'n klm' or n_klm, where n refers to the symmetry operation that is applied to the coordinates stored in _atom_site_fract_x, _atom_site_fract_y and _atom_site_fract_z. The value of n must match a number given in _symmetry_equiv_pos_site_id. k, l and m refer to the translations that are subsequently applied to the symmetry-transformed coordinates to generate the atom used in calculating the contact. These translations (x, y, z) are related to (k, l, m) by $[ k = 5 + x, \qquad l = 5 + y, \qquad m = 5 + z.]$

By adding 5 to the translations, the use of negative numbers is avoided. As an example, the symmetry code 7_645 means that the symmetry operation with label `7' in the _symmetry_equiv_pos_site_id list is applied and the resulting position is translated +1.0 × a along the x axis, −1.0 × b along the y axis and 0.0 × c along the z axis, where a, b and c are the unit-cell edges.

List entries with a _geom_angle_publ_flag value of yes are those that should be published.

The GEOM_BOND category records intramolecular bond distances. In Example 3.2.4.10, all the atoms are untransformed and are at the positions given in the ATOM_SITE list. The symmetry code is 1_555, where the trivial symmetry operation x, y, z is numbered `1' by _symmetry_equiv_pos_site_id.

Example 3.2.4.10. List of bonds.

[Scheme scheme59]

The GEOM_CONTACT category records nonbonded interatomic contacts. In Example 3.2.4.11, all the atoms are untransformed and are at the positions given in the ATOM_SITE list, and therefore the symmetry codes all have the value ` .' (meaning `inapplicable'). This is another method for indicating the identity transformation.

Example 3.2.4.11. List of nonbonded interatomic contacts.

[Scheme scheme60]

The GEOM_HBOND category records details about hydrogen bonds. Unlike other categories in the GEOM group, the GEOM_HBOND category records information about both distances and angles, including donor–acceptor, donor–hydrogen and acceptor–hydrogen distances and the included angle at the hydrogen-atom site (see Example 3.2.4.12). The comments above about the interpretation of symmetry codes and their relevance in the formal assignment of the category key also apply to this category.

Example 3.2.4.12. List of hydrogen-bond distances and angles.

[Scheme scheme61]

Note that, strictly speaking, this category should only be populated if coordinates for the hydrogen atom are available (because the mandatory component of the category key _geom_hbond_atom_site_label_H needs a parent label in the atom-site list). In practice, hydrogen bonds can be assumed between donor atoms and acceptors even if the hydrogen atom is not specifically located.

The items in the GEOM_TORSION category describe the torsion angle in degrees generated for the bonded sequence of four atom sites identified by the _geom_torsion_atom_site_label_* codes. As with other geometry-specific site labels, these must match labels specified as _atom_site_label in the atom list. The torsion angle definition is that of Klyne & Prelog (1960).

Example 3.2.4.13 includes two sites that have been generated by crystallographic symmetry operations and lattice translations from the parent sites in the atom list.

Example 3.2.4.13. List of torsion angles.

[Scheme scheme62]

3.2.4.4. Symmetry and space-group information

| top | pdf |

The categories describing symmetry are as follows:

SYMMETRY group

Original symmetry categories (§3.2.4.4.1

)

SYMMETRY

SYMMETRY_EQUIV

Replacement symmetry categories (§3.2.4.4.2

)

SPACE_GROUP

SPACE_GROUP_SYMOP

The SPACE_GROUP and older SYMMETRY categories contain information about the symmetry of the crystal; specifically the space group and the symmetry-equivalent positions for that space group. More information about the symmetry is available in the symCIF dictionary described in Chapter 3.8 and presented in Chapter 4.7 . The categories SPACE_GROUP and SPACE_GROUP_SYMOP were imported from symCIF in version 2.3 of the core dictionary, and are intended to replace the SYMMETRY and SYMMETRY_EQUIV categories. In most cases, there are strict equivalences between data items in the two sets. The new categories have been adopted for greater compatibility with future expansions to the symmetry CIF dictionary, and to correct some potentially misleading practices in the original categories. Although all the data items in SYMMETRY and SYMMETRY_EQUIV_POS are now formally marked as deprecated, it is likely that the older data items will remain in circulation for some time.

3.2.4.4.1. Data items in SYMMETRY and related categories

| top | pdf |

The data items in these categories are as follows:

(a) SYMMETRY [Scheme scheme63]

(b) SYMMETRY_EQUIV [Scheme scheme64]

The bullet ( $[\bullet]$ ) indicates a category key. In practice _symmetry_equiv_pos_site_id is often absent from older CIFs. The dagger ( $[\dagger]$ ) indicates a deprecated item, which should not be used in the creation of new CIFs.

The data items in the SYMMETRY category (now superseded by SPACE_GROUP) were used to record the space group. The Hermann–Mauguin (H-M) symbol was given by _symmetry_space_group_name_H-M. The dictionary definition recommended the use of the `full' H-M symbol as listed in International Tables for Crystallography Volume A , but was not explicit about the meaning of `full'. The dictionary examples showed short-form symbols expanded to a complete representation of individual symmetry elements; thus Pnnn would be given as 'P 2/n 2/n 2/n', and the monoclinic space group [P2_1/m] would be given as 'P 1 21/m 1' for the b-axis unique setting or 'P 1 1 21/m' for the c-axis unique setting.

In practice, abbreviated symbols were often used, following conventions established over many years; thus 'P 21/m' was often given as the Hermann–Mauguin symbol when the `usual' b setting of a monoclinic cell had been chosen. It is recommended that these conventions should continue to be followed when the new data item _space_group_name_H-M_alt is used instead.

The dictionary examples also suggested concise ways of indicating the origin choice within the _symmetry_space_group_name_H-M field; since there is no formal description of how to do this, different authors used different wording. Hence, _symmetry_space_group_name_H-M was always best considered as a container for the representation of the space group that would appear in a published article, and not as a machine-readable source of information about the crystallographic symmetry.

The two mechanisms for conveying the symmetry transformations in a fully machine-readable form were the Hall symbol _symmetry_space_group_name_Hall (Hall, 1981a,b; Hall & Grosse-Kunstleve, 2001) and a complete listing of the symmetry operations using data items in the SYMMETRY_EQUIV category.

The data item _symmetry_cell_setting indicates the crystal system, not (as suggested by its name) the setting used.

The SYMMETRY_EQUIV category, now superseded by SPACE_GROUP_SYMOP, provided a list of symmetry-equivalent positions in algebraic notation. Formally, _symmetry_equiv_pos_site_id acted as a category key, with any arbitrary numeric value that uniquely identifies each operator. Historically, the earliest versions of the core dictionary did not have such an identifier at all and the separate equivalent positions were indexed by their position in the _symmetry_equiv_pos_as_xyz list. This interpretation was vulnerable to inadvertent re-ordering of the list of equivalent positions, and for this reason, as well as to satisfy the formal need for a category key, _symmetry_equiv_pos_site_id was added (Example 3.2.4.14). For compatibility with software that was written to handle the earlier arrangement, it is recommended that _symmetry_equiv_pos_site_id gives sequential integer labels, starting with 1, to the equivalent positions in the sequence in which they appear in the CIF.

Example 3.2.4.14. A list of symmetry-equivalent positions.

[Scheme scheme65]

Note that the _symmetry_equiv_pos_as_xyz list must contain all symmetry-equivalent positions of the space group, including those generated by lattice centring and a centre of symmetry, if present.

3.2.4.4.2. Data items in SPACE_GROUP and related categories

| top | pdf |

Data items in these categories are as follows:

(a) SPACE_GROUP [Scheme scheme66]

(b) SPACE_GROUP_SYMOP [Scheme scheme67]

The bullet ( $[\bullet]$ ) indicates a category key.

The data items in the SPACE_GROUP category record the space group and crystal system. They recognize the common practice of supplying the space group in Hermann–Mauguin notation, though the H-M symbol does not contain complete information about the symmetry and the space-group origin. _space_group_name_H-M_alt allows the use of any legitimate H-M symbol as listed in International Tables for Crystallography Volume A or derived by similar principles. It does not give rigorous direction on how the symbols should be presented. It is recommended that the use of this symbol in CIFs containing articles for publication should follow the guidelines for _symmetry_space_group_name_H-M (Section 3.2.4.4.1).

Because a given space-group type may be described by more than one Hermann–Mauguin symbol, the space-group type should be specified by the use of _space_group_IT_number.

Two mechanisms exist for conveying fully machine-readable descriptions of the symmetry transformations relevant to the space group and setting. The first is the Hall symbol (Hall, 1981a,b; Hall & Grosse-Kunstleve, 2001), which uniquely defines the space group and its reference to a particular coordinate system; it is specified in the data item _space_group_name_Hall. Alternatively, the symmetry operations may be listed in full using data items in the SYMMETRY_EQUIV category.

The SPACE_GROUP_SYMOP category provides a list of the symmetry operators for a space group in algebraic notation. It replaces the category SYMMETRY_EQUIV_POS. Unlike the older category, where in practice the category key could be omitted from listings (and must therefore be generated implicitly by parsing software), the category key _space_group_symop_id must be given. See Example 3.2.4.15, which may be compared with Example 3.2.4.14.

Example 3.2.4.15. A list of symmetry operators using data items from the SPACE_GROUP_SYMOP category.

[Scheme scheme68]

3.2.4.5. Bond-valence information

| top | pdf |

Categories describing bond valences are as follows:

VALENCE group

VALENCE_PARAM

VALENCE_REF

Data items in these categories are as follows:

(a) VALENCE_PARAM [Scheme scheme69]

(b) VALENCE_REF [Scheme scheme70]

The arrow ( $[\rightarrow]$ ) is a reference to a parent data item.

The data items in this category group relate to bond valences, which are widely used in inorganic crystallography to confirm and analyse the results of crystal structure determinations. Bond valences are determined from the bond lengths and have the useful property that their sum around any atom is equal to the atom valence (formal charge). They are increasingly being published with bond lengths. The data item _geom_bond_valence in the GEOM_BOND category allows the bond valence to be associated with the bond length.

The two categories discussed here list the parameters used to calculate the bond valences and their literature sources. These items might also be published, particularly where there is some uncertainty about the appropriate parameters to use.

The data items in the VALENCE_PARAM category define the parameters used for calculating bond valences from bond lengths. In addition to the parameters, a pointer to the reference for the source of the parameters (in VALENCE_REF) is given (Example 3.2.4.16).

Example 3.2.4.16. A list of bond-valence parameters.

[Scheme scheme71]

References

Hall, S. R. (1981a). Space-group notation with an explicit origin. Acta Cryst. A37, 517–525.Google Scholar

Hall, S. R. (1981b). Space-group notation with an explicit origin. Erratum. Acta Cryst. A37, 921.Google Scholar

Hall, S. R. & Grosse-Kunstleve, R. W. (2001). International tables for crystallography, Vol. B, Reciprocal space, edited by U. Shmueli, 2nd ed., Appendix A1.4.2.3. Dordrecht: Kluwer Academic Publishers.Google Scholar

Klyne, W. & Prelog, V. (1960). Description of steric relationships across single bonds. Experientia, 16, 521–523.Google Scholar

International Tables for Crystallography (2006). Vol. G. ch. 3.2, pp. 102-112

Section 3.2.4. Atomicity, chemistry and structure

3.2.4. Atomicity, chemistry and structure

3.2.4.1. Atom sites

3.2.4.1.1. Individual atom sites

Example 3.2.4.1. List of atom-site coordinates, equivalent isotropic U values and refinement conditions.

Example 3.2.4.2. Separate list of anisotropic U values with _atom_site_aniso_label acting as the key that uniquely identifies table rows in this listing.

Example 3.2.4.3. Chemical connectivity table; atoms are linked back to atom-site positions through matching values of _atom_site_chemical_conn_number and _chemical_conn_atom_number.

Example 3.2.4.4. Handling of occupational disorder of atom sites.

3.2.4.1.2. Collections of atom sites

3.2.4.1.3. Atom types

Example 3.2.4.5. Reference to atomic scattering factors.

3.2.4.2. Chemical identification and connectivity information

3.2.4.2.1. Chemical identification

Example 3.2.4.6. Different representations of a chemical formula.

3.2.4.2.2. Chemical connectivity

Example 3.2.4.7. Representation of a two-dimensional chemical diagram.

Example 3.2.4.8. Bond types in a chemical connectivity table.

3.2.4.3. Molecular or packing geometry

3.2.4.3.1. Contents of the geometry-related categories

Example 3.2.4.9. List of bond angles.

Example 3.2.4.10. List of bonds.

Example 3.2.4.11. List of nonbonded interatomic contacts.

Example 3.2.4.12. List of hydrogen-bond distances and angles.

Example 3.2.4.13. List of torsion angles.

3.2.4.4. Symmetry and space-group information

3.2.4.4.1. Data items in SYMMETRY and related categories

Example 3.2.4.14. A list of symmetry-equivalent positions.

3.2.4.4.2. Data items in SPACE_GROUP and related categories

Example 3.2.4.15. A list of symmetry operators using data items from the SPACE_GROUP_SYMOP category.

3.2.4.5. Bond-valence information

Example 3.2.4.16. A list of bond-valence parameters.

References

Example 3.2.4.2. Separate list of anisotropic U values with `_atom_site_aniso_label` acting as the key that uniquely identifies table rows in this listing.

Example 3.2.4.3. Chemical connectivity table; atoms are linked back to atom-site positions through matching values of `_atom_site_chemical_conn_number` and `_chemical_conn_atom_number`.