Individual atom sites

Hall, S. R.; Fitzgerald, P. M. D.; McMahon, B.

doi:10.1107/97809553602060000734

International
Tables for
Crystallography
Volume G
Definition and exchange of crystallographic data
Edited by S. R. Hall and B. McMahon

pdf | chapter contents | chapter index | related articles

International Tables for Crystallography (2006). Vol. G. ch. 3.2, pp. 102-105

Section 3.2.4.1.1. Individual atom sites

S. R. Hall,^a ^* P. M. D. Fitzgerald^b and B. McMahon^c

^a School of Biomedical and Chemical Sciences, University of Western Australia, Crawley, 6009, Australia,^bMerck Research Laboratories, Rahway, New Jersey, USA, and ^cInternational Union of Crystallography, 5 Abbey Square, Chester CH1 2HU, England
Correspondence e-mail: syd@crystal.uwa.edu.au

3.2.4.1.1. Individual atom sites

| top | pdf |

The data items in this category are as follows:

ATOM_SITE [Scheme scheme36]

The bullet ( $[\bullet]$ ) indicates a category key. For this category an alternative category key can be formed by taking all the_atom_site_label_component_* items together. Anisotropic displacement parameters may also be listed in a separate loop, for which _atom_site_aniso_label forms the key. The arrow ( $[\rightarrow]$ ) is a reference to a parent data item. The dagger ( $[\dagger]$ ) indicates a deprecated item, which should not be used in the creation of new CIFs.

Data items in the ATOM_SITE category represent the positions of atom sites identified in the structural model, their spatial distribution defined by isotropic or anisotropic displacement parameters, details of restraints or constraints applied during the refinement, and the interpretation of their occupancy due to structural or compositional disorder.

Example 3.2.4.1 is a typical extract from a list of atom-site coordinates, with equivalent isotropic displacement values and refinement conditions. Each site is identified by _atom_site_label.

Example 3.2.4.1. List of atom-site coordinates, equivalent isotropic U values and refinement conditions.

[Scheme scheme37]

The coordinates are specified as fractional x, y, z values along the unit-cell axes. Coordinates may also be specified in ångström units along orthogonal Cartesian axes using the data names _atom_site_Cartn_x, _atom_site_Cartn_y and _atom_site_Cartn_z. The transformation matrix between Cartesian and fractional coordinates can be given in the ATOM_SITES category.

(Note that occupancy values are unaffected by symmetry. This is discussed later in connection with site multiplicity.)

_atom_site_U_iso_or_equiv records the isotropic atomic displacement value U_iso in the case of isotropic refinement. In the case of anisotropic refinement, _atom_site_U_iso_or_equiv records the equivalent isotropic value U_eq, defined as $[U_{\rm eq} = (1/3)\textstyle\sum_i\big[\sum_j (U^{ij}a_i^*a_j^*a_ia_j)\big],]$ where [a_i] are the real-space cell lengths, [a^*_j] are the reciprocal-space cell lengths and $[U^{ij}]$ are the anisotropic displacement parameters.

The data item _atom_site_adp_type identifies which value is given. An alternative equivalent isotropic displacement parameter _atom_site_U_equiv_geom_mean may be calculated as the geometric mean of the anisotropic parameters, $[U_{\rm eq}=(U_iU_jU_k)^{1/3}, ]$ where the [U_i] are the principal components of the orthogonalized $[U^{ij}]$ .

Data names also exist for the corresponding quantities calculated from B values, although the use of B values is discouraged by the IUCr Commission on Crystallographic Nomenclature.

For each site, _atom_site_calc_flag takes one of the following values: d, to indicate that the atom-site coordinates were determined from the diffraction intensities; c or calc to indicate that they were calculated from molecular geometry considerations; or dum, for a dummy site.

Specific restraints or constraints applied to a site may be indicated by one or more of the _atom_site_refinement_flags_* items.

The data item _atom_site_occupancy defines the fraction of the atom type present at the site. Note that the same site may occur more than once in the list, identified by separate values of _atom_site_label. Such an arrangement would represent contributions from separate atom types (perhaps in modelling compositional disorder). The sum of occupancies of all atom types present at a single site may not significantly exceed 1.0 (unless it is a dummy site with no physical significance). Note that an atom of a given chemical species positioned on a special position (e.g. on a twofold axis) will in general be assigned a full occupancy value of 1.0. However, it will occur less often in the unit cell than an atom on a general position (in this example by a factor of 2). To account for this in structure-factor calculations it may be given a population value of 0.5 within the refinement program. A population adjustment of this kind is not implied in the assignment of a value to _atom_site_occupancy. The multiplicity of the site owing to the space-group symmetry can be recorded in _atom_site_symmetry_multiplicity.

The disorder-related data names in this example will be discussed below.

_atom_site_type_symbol is a code which must match an entry in the ATOM_TYPE category that supplies information about the elemental composition and scattering factors of the atom or atoms occupying the site. Note that it is quite legitimate to have an atom-type symbol such as `Fe3+Ni2+', referring to a mixed-composition atom site. The effective physical properties of such a pseudo-atom should be given in full in the ATOM_TYPE category.

Example 3.2.4.2 demonstrates how the anisotropic displacement parameters are conventionally broken out into a separate list. When this is done, each atom site is identified by _atom_site_aniso_label, and this must of course match the value of _atom_site_label specifying the position of the site.

Example 3.2.4.2. Separate list of anisotropic U values with `_atom_site_aniso_label` acting as the key that uniquely identifies table rows in this listing.

[Scheme scheme38]

The data item _atom_site_label is normally used as the identifier of each individual atom site in a list of coordinates and atomic displacement factors. Historically, the labels given to atom sites have been chosen to summarize useful information about the atom located at the site. Almost invariably the label contains the symbol of the chemical element or elements occupying the site; there may also be indicators of charge, valence, chemical connectivity, disorder, occupation of a site of crystallographic symmetry or grouping within a component of secondary structure within large molecules. In a CIF, it is formally sufficient that atom-site labels are unique, as all the information about composition, valence, connectivity and so on can be extracted from the data items designed specifically to record this information. However, it is preferable that an atom-site label should summarize the relevant features of the site. Many styles and conventions for labelling atoms are in use in crystallography, so to enable interchange with other crystallographic data file formats, the core dictionary contains a detailed but highly flexible set of rules for constructing and parsing atom-site labels.

Labelling atom sites in crystallography usually serves two distinct purposes: (a) to identify the site in the molecule and crystal, and (b) to identify the chemical element that occupies that site. The core dictionary makes this distinction clear by defining ATOM_SITE and ATOM_TYPE as separate data categories. The connection between the two categories is made through the equivalence of the data items _atom_site_type_symbol (in the ATOM_SITE list) and _atom_type_symbol (in the ATOM_TYPE list). Often, however, crystallographers use a single label _atom_site_label to define both the site and the chemical species occupying it.

The _atom_site_label may be composed of as many as eight separate components; the recommended convention for construction of the string is as follows.

Component 0 [optionally identical to a value of _atom_type_symbol] (mandatory): A character string containing any character except a blank or an underline, with the proviso that each digit `0'–`9' is used only to designate an oxidation state and, as such, must be followed by a plus `+' or a minus `-' character. It is recommended that the element symbols be used when applicable. Examples of permissible codes are: Cu, Cu2+, dummy, Fe3+Ni2+, S-, H*, H(SDS).

Component 1 [atom number code] (optional): This string may contain any alphanumeric character except a blank or an underline, but the first character must be a digit `0'–`9' and the second character may not be a plus `+' or a minus `-'. Component 1 is intended primarily to differentiate sites containing the same atom type, but it can be used for any purpose. Examples of combined component 0 and 1 codes are: C1, C103g28, Fe3+17b, H*251, boron2a, Ni2+2, Fe2+Ni2+2, where component 0 is in bold to indicate how these labels are parsed.

Component 2 [residue code] (optional): This string may contain any character except a blank or underline. It is intended primarily to give specific structural information such as the molecular fragment or amino-acid type, e.g. C1_gly, O1_SO4. If component 2 is present, it is separated from the concatenated components 0 and 1 with an underline character.

Components 3–7 [sequence, remoteness, chain order, alternate, footnote codes] (optional): These strings may contain any character except a blank or an underline. The underline character is used to separate the individual components. The names associated with the separate components suggest their roles in constructing composite labels that match the conventions of site labelling in the PDB format for macromolecular structure files. However, they are not restricted to these functions and may be used in other ways.

Component 0 is normally identical to an _atom_type_symbol code in the ATOM_TYPE list. However, if it is not, an _atom_site_type_symbol code must appear in the ATOM_TYPE list in order to identify the atom type. In these cases, component 0 may contain any code consistent with the rules given in the dictionary. Thus, component 0 could be Ca to identify an alpha carbon, provided that the _atom_site_type_symbol is encoded as C to indicate that the atom type is carbon.

Multiple occupation of a single atom site by different atom species (compositional disorder) may be handled simply by having multiple values of _atom_site_label referring to the same site in the crystal structure. Alternatively, multiple occupancy of an atom site may be denoted by a unique character or characters in component 0 of the atom label, with the ATOM_TYPE list containing the equivalent pseudo element label entry with values that are weighted averages of those for the constituent elements. The proportions of the atom types should then be defined using _atom_type_description.

This _atom_site_label construction is flexible, visually decipherable and well suited to computer applications. The components can be easily identified and stripped with a single pass, from left to right, along the label string. Note that the underline separators are only used if higher-order components exist. If intermediate components are not used they may be omitted provided the underline separators are retained. For example, the label C233_ _ggg is acceptable and contains the components 0: C, 1: 233, 2: null and 3: ggg. There is no requirement that the same number of components should be used in each label.

The _atom_site_label may be replaced by separate data items specifying the individual components of an atom label; this may be useful for large lists of site coordinates, for example in a macromolecular structure, where site-labelling components follow a systematic convention and where subsets of the atom sites need to be searched for or extracted using individual label components. Such uses are not common in files built with core CIF data names; the mmCIF dictionary identifies substructural components in biological macromolecules by alternative techniques (Section 3.6.7 ).

There is no comparable fragmentation of the components of _atom_site_aniso_label. Where separate lists of anisotropic displacement parameters use complex atom-site labels, either the coordinate list should use _atom_site_label alone or the processing software needs to be able to construct a value for _atom_site_label from the separate components _atom_site_label_component_* in order to test the equivalence between the labels in the coordinates and anisotropic displacement parameters lists.

While either atom-labelling technique is permitted, it is recommended that the individual label components are not used unless there is an overwhelming argument to do so.

Information about the molecular model is sometimes embedded in a labelling convention. In CIF, this information is usually expressed through other data items.

The connectivity of a molecule is described by the CHEMICAL group of categories, and more specifically through the CHEMICAL_CONN_ATOM and CHEMICAL_CONN_BOND categories.

The link between atom sites in the coordinate list and the corresponding atoms in the molecular model is established using the data item _chemical_conn_atom_number in the CHEM_CONN_ATOM category, and the data items _chemical_conn_bond_atom_1 and _chemical_conn_bond_atom_2 in the CHEMICAL_CONN_BOND category. The values of these data items must match values for the data item _atom_site_chemical_conn_number in the ATOM_SITE list. Example 3.2.4.3 shows an extract from a connectivity table; a more complete version of this table is given in the relevant category descriptions in the dictionary.

Example 3.2.4.3. Chemical connectivity table; atoms are linked back to atom-site positions through matching values of `_atom_site_chemical_conn_number` and `_chemical_conn_atom_number`.

[Scheme scheme39]

Note that there is no guarantee that the refined atom-site coordinates that characterize the asymmetric unit will correspond to locations within a single connected molecular species. Crystal symmetry transformations may need to be applied to individual sites in order to map the contents of a connected molecular residue to real space in the unit cell. There is no provision in the CHEMICAL_CONN categories for the specification of these symmetry transformations; thus these higher-order molecular geometries are best described using data items in the GEOM categories, which do allow for the specification of symmetry transformations.

It may also be the case that not all atom positions have been located; this is particularly true for hydrogen atoms, and the data item _atom_site_attached_hydrogens is provided for book-keeping purposes to indicate hydrogen atoms known to be bonded to an atom but whose positions have not been refined (or calculated).

Example 3.2.4.4 shows how the disorder of a group of bonded atoms over a set of atom sites (occupational disorder) is described. In this example of a disordered tetrafluoroborate anion, the data item _atom_site_disorder_assembly takes the value A, and the data item _atom_site_disorder_group takes the values 1 and 2, indicating the two alternative positions of the disordered group.

Example 3.2.4.4. Handling of occupational disorder of atom sites.

[Scheme scheme40]

The remaining items in this category are clearly described in their individual dictionary entries. However, the now-deprecated data item _atom_site_refinement_flags should be mentioned. This was allowed to take values obtained by concatenating one or more of the single-letter flags:

. no refinement constraints;

S special-position constraint on site;

G rigid-group refinement of site;

R riding-atom site attached to non-riding atom;

D distance or angle restraint on site;

T thermal displacement constraints;

U U_iso or U^ij restraint (rigid bond);

P partial occupancy constraint.

These individual flags are listed in the dictionary using the DDL field _enumeration, which denotes a list of mutually exclusive permitted values. As concatenation of values is allowed here, dictionary-based software must be modified to handle this data item as a special case. To avoid the need for this in future, the data item was marked as deprecated from version 2.3 of the dictionary, and is replaced by the three separate items _atom_site_refinement_flags_posn, *_adp and *_occupancy. For each of these, the relevant combinations of refinement flags are fully enumerated (for example _atom_site_refinement_flags_adp may take any one of the values T, U or TU). This logically separates the different types of refinement constraints or restraints that an author might want to record and allows software to parse the data item.

References