International
Tables for
Crystallography
Volume G
Definition and exchange of crystallographic data
Edited by S. R. Hall and B. McMahon

International Tables for Crystallography (2006). Vol. G. ch. 3.6, pp. 164-169

Section 3.6.7.1. Atom sites

P. M. D. Fitzgerald,a* J. D. Westbrook,b P. E. Bourne,c B. McMahon,d K. D. Watenpaughe and H. M. Bermanf

a Merck Research Laboratories, Rahway, New Jersey, USA,bProtein Data Bank, Research Collaboratory for Structural Bioinformatics, Rutgers, The State University of New Jersey, Department of Chemistry and Chemical Biology, 610 Taylor Road, Piscataway, New Jersey, USA,cResearch Collaboratory for Structural Bioinformatics, San Diego Supercomputer Center, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0537, USA,dInternational Union of Crystallography, 5 Abbey Square, Chester CH1 2HU, England,eretired; formerly Structural, Analytical and Medicinal Chemistry, Pharmacia Corporation, Kalamazoo, Michigan, USA, and fProtein Data Bank, Research Collaboratory for Structural Bioinformatics, Rutgers, The State University of New Jersey, Department of Chemistry and Chemical Biology, 610 Taylor Road, Piscataway, New Jersey, USA
Correspondence e-mail:  paula_fitzgerald@merck.com

3.6.7.1. Atom sites

| top | pdf |

The categories describing atom sites are as follows:

ATOM group
Individual atom sites (§3.6.7.1.1[link])
ATOM_SITE
ATOM_SITE_ANISOTROP
Collections of atom sites (§3.6.7.1.2[link])
ATOM_SITES
ATOM_SITES_FOOTNOTE
Atom types (§3.6.7.1.3[link])
ATOM_TYPE
Alternative conformations (§3.6.7.1.4[link])
ATOM_SITES_ALT
ATOM_SITES_ALT_ENS
ATOM_SITES_ALT_GEN

The ATOM category group represents a compromise between the representation of a small-molecule structure as an annotated list of atomic coordinates and the need in macromolecular crystallography to present a more structured view organized around residues, chains, sheets, turns, helices etc. The locations of individual atoms and other information about the atom sites are given using data items in this category group. The categories within the group may be classified as shown in the summary above.

The ATOM_SITE, ATOM_SITES and ATOM_TYPE categories have many data items that are aliases of equivalent data items in the same categories in the core CIF dictionary, but the conventions for the labelling of the atom sites are different.

The ATOM_SITE_ANISOTROP and ATOM_SITES_FOOTNOTE categories are new to the mmCIF dictionary, as are the categories related to alternative conformations: ATOM_SITES_ALT, ATOM_SITES_ALT_ENS and ATOM_SITES_ALT_GEN.

3.6.7.1.1. Individual atom sites

| top | pdf |

The data items in these categories are as follows:

(a) ATOM_SITE [Scheme scheme85]

(b) ATOM_SITE_ANISOTROP [Scheme scheme86]

The bullet ([\bullet]) indicates a category key. The arrow ([\rightarrow]) is a reference to a parent data item. Items in italics have aliases in the core CIF dictionary formed by changing the full stop (.) to an underscore (_) except where indicated by the [\sim] symbol. Data items marked with a plus (+) have companion data names for the standard uncertainty in the reported value, formed by appending the string _esd to the data name listed. The double arrow ([\rightleftharpoons]) indicates alternative names in a distinct category.

The refined coordinates of the atoms in the crystallographic asymmetric unit are stored in the ATOM_SITE category. Atom positions and their associated uncertainties may be given using either Cartesian or fractional coordinates, and anisotropic displacement factors and occupancies may be given for each position.

The relationships between categories describing atom sites are shown in Fig. 3.6.7.1[link].

[Figure 3.6.7.1]

Figure 3.6.7.1 | top | pdf |

The family of categories used to describe atom sites. Boxes surround categories of related data items. Data items that serve as category keys are preceded by a bullet ([\bullet]). Lines show relationships between linked data items in different categories with arrows pointing at the parent data items.

Several of the mmCIF data names arise from the need to associate atom sites with residues and chains. As in the core CIF dictionary, the identifier for the atom site is the data item _atom_site_label. To accommodate standard practice in macromolecular crystallography, the mmCIF atom identifier is the aggregate of _atom_site.label_alt_id, *.label_asym_id, *.label_atom_id, *.label_comp_id and *.label_seq_id. For the two types of files to be compatible, the data item _atom_site.id, which is independent of the different modes of identifying atoms (discussed below), was introduced. The mmCIF identifier _atom_site.id is aliased to the core CIF identifier _atom_site_label.

Since the identifier does not need to be a number, it is quite possible (although it is not recommended) to use a complex label with an internal structure corresponding to the label components that the mmCIF dictionary provides as separate data items. This scheme is described in Section 3.2.4.1.1.[link] However, normal practice in mmCIFs should be to label sites with the functional components available and to assign a simple numeric sequence to the values of _atom_site.id (see Example 3.6.7.1[link]).

Example 3.6.7.1. Part of the coordinate list for an HIV-1 protease structure (PDB 5HVP) described with data items in the ATOM_SITE category. Atoms are given for both polymer and non-polymer regions of the structure, and atoms in the side chain of residue 12 adopt alternative conformations.

[Scheme scheme87]

In addition to labelling information, each entry in the ATOM_SITE list must contain a value for the data item _atom_site.type_symbol, which is a pointer to the table of element symbols in the ATOM_TYPE category. All other data items in the ATOM_SITE category are optional, but it is normal practice to give either the Cartesian or fractional coordinates. Most macromolecular structures use Cartesian coordinates. Isotropic displacement factors are normally placed directly in the ATOM_SITE category, using _atom_site.B_iso_or_equiv. Anisotropic displacement factors may be placed directly in the ATOM_SITE category or in the ATOM_SITE_ANISOTROP category. U's may be used instead of B's. It is not acceptable to use both U's and B's, nor is it acceptable to have anisotropic displacement factors in both the ATOM_SITE category and the ATOM_SITE_ANISOTROP category.

Each atom within each chemical component is uniquely identified using the data item _atom_site.label_atom_id, which is a reference to the data item _chem_comp_atom.atom_id in the CHEM_COMP_ATOM category.

The specific object in the asymmetric unit to which the atom belongs is indicated using the data item _atom_site.label_asym_id, which is a reference to the data item _struct_asym.id in the STRUCT_ASYM category. For macromolecules, it is useful to think of this identifier as a chain ID.

The chemical component to which the atom belongs is indicated using the data item _atom_site.label_comp_id, which is a reference to the data item _chem_comp.id in the CHEM_COMP category. The chemical component that is referenced in this way may be either a non-polymer or a monomer in a polymer; if it is a monomer in a polymer, it is useful to think of this identifier as the residue name.

The correspondence between the sequence of an entity in a polymer and the sequence information in the coordinate list (and in the STRUCT categories) is established using the data item _atom_site.label_seq_id, which is a reference to the data item _entity_poly_seq.num in the ENTITY_POLY_SEQ category. This identifier has no meaning for entities that are not part of a polymer; in a polymer it is useful to think of this identifier as the residue number. Note that this is strictly a number. If the combination of a number with an insertion code is needed, _atom_site.auth_seq_id should be used (see below).

An alternative set of identifiers can be used for the *_asym_id, *_atom_id, *_comp_id and *_seq_id identifiers, but not for *_alt_id. The _atom_site.label_* data names are standard; there are rules for these identifiers such as the requirement that residue numbers are sequential integers. Different databases may also have their own rules. However, the author of an mmCIF may wish to use a nonstandard labelling scheme, e.g. to reflect the residue numbering scheme of a structure to which the present structure is homologous, apart from insertions and gaps. Another situation in which a nonstandard labelling scheme might be used is to follow a local convention for atom names in a non-polymer, such as a haem, that conflicts with the scheme required by a database in which the structure is to be deposited. In these situations, alternative identifiers can be given using the data names (_atom_site.auth_*).

In regions of the structure with alternative conformations, the specific conformation to which an atom belongs can be indicated using the data item _atom_site.label_alt_id, which is a reference to the data item _atom_sites_alt.id in the ATOM_SITES_ALT category.

The chemically distinct part of the structure (e.g. polymer chain, ligand, solvent) to which an atom belongs can be indicated using the data item _atom_site.label_entity_id, which is a reference to the data item _entity.id in the ENTITY category.

Most of the information that needs to be associated with an atom site is conveyed by the values of specific data names in mmCIF. However, for historical reasons, a pointer to additional free-text information about an atom site or about a group of atom sites can be given using the data item _atom_site.footnote_id, which is a reference to the data item _atom_sites_footnote.id in the ATOM_SITES_FOOTNOTE category.

The data item _atom_site.group_PDB is a place holder for the tags used by the PDB to identify types of coordinate records. It allows interconversion between mmCIFs and PDB format files. The only permitted values are ATOM and HETATM.

As in the core CIF dictionary, anisotropic displacement parameters in an mmCIF can be given in the same list as the atom positions and occupancies, or can be given in a separate list. However, DDL2 does not permit the same data names to be used for both constructs. Therefore, in mmCIF, anisotropic displacement parameters presented in a separate list are handled in a separate category with its own key, _atom_site_anisotrop.id, which must match a corresponding label in the atom-site list, _atom_site.id.

The individual elements of the anisotropic displacement matrix are labelled slightly differently in the mmCIF dictionary than in the core CIF dictionary in order to emphasize their matrix character. However, the definitions of the corresponding data items are identical in the two dictionaries.

3.6.7.1.2. Collections of atom sites

| top | pdf |

The data items in these categories are as follows:

(a) ATOM_SITES [Scheme scheme88]

(b) ATOM_SITES_FOOTNOTE [Scheme scheme89]

The bullet ([\bullet]) indicates a category key. The arrow ([\rightarrow]) is a reference to a parent data item. Items in italics have aliases in the core CIF dictionary formed by changing the full stop (.) to an underscore (_) except where indicated by the [\sim] symbol.

The ATOM_SITES category of the core dictionary, which is used to record information that applies collectively to all the atom sites in the model of the structure, is incorporated without change into the mmCIF dictionary, and Section 3.2.4.1.2[link] can be consulted for details.

In practice, the data names in the PHASING categories are preferred to the aliases to the core CIF data items _atom_sites.solution_primary, *_secondary and *_hydrogens. The data items in the mmCIF PHASING categories are designed to allow a much more detailed description of how a macromolecular structure was solved.

The data item _atom_sites.entry_id has been added to the ATOM_SITES category to provide the formal category key required by the DDL2 data model.

The ATOM_SITES_FOOTNOTE category can be used to note something about a group of sites in the ATOM_SITE coordinate list, each of which is flagged with the same value of _atom_site.footnote_id. For example, an author may wish to note atoms for which the electron density is very weak, or atoms for which static disorder has been modelled. Example 3.6.7.2[link] shows how an author has used these data items to describe alternative orientations in part of a structure. However, the very large number of data names describing specific structural characteristics in the mmCIF dictionary mean that these rather general data names are rarely needed.

Example 3.6.7.2. Footnotes for particular groups of atom sites in an HIV-1 protease structure (PDB 5HVP) using data items in the ATOM_SITES_FOOTNOTE category.  [Scheme scheme90]

3.6.7.1.3. Atom types

| top | pdf |

The data items in this category are as follows:

ATOM_TYPE [Scheme scheme91]

The bullet ([\bullet]) indicates a category key. Items in italics have aliases in the core CIF dictionary formed by changing the full stop (.) to an underscore (_) except where indicated by the [\sim] symbol.

The ATOM_TYPE category, which provides information about the atomic species associated with each atom site in the model of the structure, is used in the same way in the mmCIF dictionary as in the core CIF dictionary. See Section 3.2.4.1.3[link] for details.

3.6.7.1.4. Alternative conformations

| top | pdf |

The data items in these categories are as follows:

(a) ATOM_SITES_ALT [Scheme scheme92]

(b) ATOM_SITES_ALT_ENS [Scheme scheme93]

(c) ATOM_SITES_ALT_GEN [Scheme scheme94]

The bullet ([\bullet]) indicates a category key. Where multiple items within a category are marked with a bullet, they must be taken together to form a compound key. The arrow ([\rightarrow]) is a reference to a parent data item.

Biological macromolecules are often very flexible, and as the resolution of a structure determination increases, it becomes increasingly possible to model reliably the alternative conformations that the structure adopts. Typically, partial occupancies are assigned to atom sites within the alternative conformations to indicate the relative frequency of occurrence of each conformation. It can, however, be difficult to deduce the possible different conformations of the whole structure from inspection of the atom-site occupancies alone. For instance, a segment of protein main chain might adopt one of three slightly different conformations, and within each conformation a particular side chain might adopt one of two possible conformations, one of which sterically distorts an adjacent residue sequence, while the other does not. The data model in the mmCIF dictionary allows these kinds of correlations in positions to be described.

The relationships between the categories used to describe alternative conformations are shown in Fig. 3.6.7.1[link].

In the core CIF dictionary, alternative conformations are indicated by using the _atom_site.disorder_assembly and *.disorder_group data items. Aliases to these data items are present in the mmCIF dictionary, but it is not intended that they should be used to describe disorder in a macromolecular structure.

The model for describing alternative conformations in mmCIF uses the ATOM_SITES_ALT family of categories. Ensembles of correlated alternative conformations can be identified using the category ATOM_SITES_ALT_ENS. Each ensemble is generated from one or more of the alternative conformations given in the list of alternative sites in the ATOM_SITES_ALT category. Data items in the ATOM_SITES_ALT_GEN category explicitly tie together the alternative conformations that contribute to each ensemble. Finally, the atoms in each alternative conformation are identified in the ATOM_SITE category by the data item _atom_site.label_alt_id.

The current version of the mmCIF dictionary cannot be used to describe an NMR structure determination completely. However, an mmCIF can be used to store the multiple models usually used to describe a structure determined by NMR using the data items in these categories.

Example 3.6.7.3[link] is a simplified version of the example given in the mmCIF dictionary (see Fig. 3.6.7.2[link]).

[Figure 3.6.7.2]

Figure 3.6.7.2 | top | pdf |

Alternative conformations in an HIV-1 protease structure (PDB 5HVP) to be described with data items in the ATOM_SITES_ALT, ATOM_SITES_ALT_ENS and ATOM_SITES_ALT_GEN categories. (a) Complete structure, (b) ensemble 1, (c) ensemble 2.

Example 3.6.7.3. Alternative conformations in an HIV-1 protease structure (PDB 5HVP) described with data items in the ATOM_SITES_ALT, ATOM_SITES_ALT_ENS and ATOM_SITES_ALT_GEN categories.

[Scheme scheme95]








































to end of page
to top of page