International
Tables for
Crystallography
Volume G
Definition and exchange of crystallographic data
Edited by S. R. Hall and B. McMahon

International Tables for Crystallography (2006). Vol. G. ch. 3.6, pp. 164-166

Section 3.6.7.1.1. Individual atom sites

P. M. D. Fitzgerald,a* J. D. Westbrook,b P. E. Bourne,c B. McMahon,d K. D. Watenpaughe and H. M. Bermanf

a Merck Research Laboratories, Rahway, New Jersey, USA,bProtein Data Bank, Research Collaboratory for Structural Bioinformatics, Rutgers, The State University of New Jersey, Department of Chemistry and Chemical Biology, 610 Taylor Road, Piscataway, New Jersey, USA,cResearch Collaboratory for Structural Bioinformatics, San Diego Supercomputer Center, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0537, USA,dInternational Union of Crystallography, 5 Abbey Square, Chester CH1 2HU, England,eretired; formerly Structural, Analytical and Medicinal Chemistry, Pharmacia Corporation, Kalamazoo, Michigan, USA, and fProtein Data Bank, Research Collaboratory for Structural Bioinformatics, Rutgers, The State University of New Jersey, Department of Chemistry and Chemical Biology, 610 Taylor Road, Piscataway, New Jersey, USA
Correspondence e-mail:  paula_fitzgerald@merck.com

3.6.7.1.1. Individual atom sites

| top | pdf |

The data items in these categories are as follows:

(a) ATOM_SITE [Scheme scheme85]

(b) ATOM_SITE_ANISOTROP [Scheme scheme86]

The bullet ([\bullet]) indicates a category key. The arrow ([\rightarrow]) is a reference to a parent data item. Items in italics have aliases in the core CIF dictionary formed by changing the full stop (.) to an underscore (_) except where indicated by the [\sim] symbol. Data items marked with a plus (+) have companion data names for the standard uncertainty in the reported value, formed by appending the string _esd to the data name listed. The double arrow ([\rightleftharpoons]) indicates alternative names in a distinct category.

The refined coordinates of the atoms in the crystallographic asymmetric unit are stored in the ATOM_SITE category. Atom positions and their associated uncertainties may be given using either Cartesian or fractional coordinates, and anisotropic displacement factors and occupancies may be given for each position.

The relationships between categories describing atom sites are shown in Fig. 3.6.7.1[link].

[Figure 3.6.7.1]

Figure 3.6.7.1 | top | pdf |

The family of categories used to describe atom sites. Boxes surround categories of related data items. Data items that serve as category keys are preceded by a bullet ([\bullet]). Lines show relationships between linked data items in different categories with arrows pointing at the parent data items.

Several of the mmCIF data names arise from the need to associate atom sites with residues and chains. As in the core CIF dictionary, the identifier for the atom site is the data item _atom_site_label. To accommodate standard practice in macromolecular crystallography, the mmCIF atom identifier is the aggregate of _atom_site.label_alt_id, *.label_asym_id, *.label_atom_id, *.label_comp_id and *.label_seq_id. For the two types of files to be compatible, the data item _atom_site.id, which is independent of the different modes of identifying atoms (discussed below), was introduced. The mmCIF identifier _atom_site.id is aliased to the core CIF identifier _atom_site_label.

Since the identifier does not need to be a number, it is quite possible (although it is not recommended) to use a complex label with an internal structure corresponding to the label components that the mmCIF dictionary provides as separate data items. This scheme is described in Section 3.2.4.1.1.[link] However, normal practice in mmCIFs should be to label sites with the functional components available and to assign a simple numeric sequence to the values of _atom_site.id (see Example 3.6.7.1[link]).

Example 3.6.7.1. Part of the coordinate list for an HIV-1 protease structure (PDB 5HVP) described with data items in the ATOM_SITE category. Atoms are given for both polymer and non-polymer regions of the structure, and atoms in the side chain of residue 12 adopt alternative conformations.

[Scheme scheme87]

In addition to labelling information, each entry in the ATOM_SITE list must contain a value for the data item _atom_site.type_symbol, which is a pointer to the table of element symbols in the ATOM_TYPE category. All other data items in the ATOM_SITE category are optional, but it is normal practice to give either the Cartesian or fractional coordinates. Most macromolecular structures use Cartesian coordinates. Isotropic displacement factors are normally placed directly in the ATOM_SITE category, using _atom_site.B_iso_or_equiv. Anisotropic displacement factors may be placed directly in the ATOM_SITE category or in the ATOM_SITE_ANISOTROP category. U's may be used instead of B's. It is not acceptable to use both U's and B's, nor is it acceptable to have anisotropic displacement factors in both the ATOM_SITE category and the ATOM_SITE_ANISOTROP category.

Each atom within each chemical component is uniquely identified using the data item _atom_site.label_atom_id, which is a reference to the data item _chem_comp_atom.atom_id in the CHEM_COMP_ATOM category.

The specific object in the asymmetric unit to which the atom belongs is indicated using the data item _atom_site.label_asym_id, which is a reference to the data item _struct_asym.id in the STRUCT_ASYM category. For macromolecules, it is useful to think of this identifier as a chain ID.

The chemical component to which the atom belongs is indicated using the data item _atom_site.label_comp_id, which is a reference to the data item _chem_comp.id in the CHEM_COMP category. The chemical component that is referenced in this way may be either a non-polymer or a monomer in a polymer; if it is a monomer in a polymer, it is useful to think of this identifier as the residue name.

The correspondence between the sequence of an entity in a polymer and the sequence information in the coordinate list (and in the STRUCT categories) is established using the data item _atom_site.label_seq_id, which is a reference to the data item _entity_poly_seq.num in the ENTITY_POLY_SEQ category. This identifier has no meaning for entities that are not part of a polymer; in a polymer it is useful to think of this identifier as the residue number. Note that this is strictly a number. If the combination of a number with an insertion code is needed, _atom_site.auth_seq_id should be used (see below).

An alternative set of identifiers can be used for the *_asym_id, *_atom_id, *_comp_id and *_seq_id identifiers, but not for *_alt_id. The _atom_site.label_* data names are standard; there are rules for these identifiers such as the requirement that residue numbers are sequential integers. Different databases may also have their own rules. However, the author of an mmCIF may wish to use a nonstandard labelling scheme, e.g. to reflect the residue numbering scheme of a structure to which the present structure is homologous, apart from insertions and gaps. Another situation in which a nonstandard labelling scheme might be used is to follow a local convention for atom names in a non-polymer, such as a haem, that conflicts with the scheme required by a database in which the structure is to be deposited. In these situations, alternative identifiers can be given using the data names (_atom_site.auth_*).

In regions of the structure with alternative conformations, the specific conformation to which an atom belongs can be indicated using the data item _atom_site.label_alt_id, which is a reference to the data item _atom_sites_alt.id in the ATOM_SITES_ALT category.

The chemically distinct part of the structure (e.g. polymer chain, ligand, solvent) to which an atom belongs can be indicated using the data item _atom_site.label_entity_id, which is a reference to the data item _entity.id in the ENTITY category.

Most of the information that needs to be associated with an atom site is conveyed by the values of specific data names in mmCIF. However, for historical reasons, a pointer to additional free-text information about an atom site or about a group of atom sites can be given using the data item _atom_site.footnote_id, which is a reference to the data item _atom_sites_footnote.id in the ATOM_SITES_FOOTNOTE category.

The data item _atom_site.group_PDB is a place holder for the tags used by the PDB to identify types of coordinate records. It allows interconversion between mmCIFs and PDB format files. The only permitted values are ATOM and HETATM.

As in the core CIF dictionary, anisotropic displacement parameters in an mmCIF can be given in the same list as the atom positions and occupancies, or can be given in a separate list. However, DDL2 does not permit the same data names to be used for both constructs. Therefore, in mmCIF, anisotropic displacement parameters presented in a separate list are handled in a separate category with its own key, _atom_site_anisotrop.id, which must match a corresponding label in the atom-site list, _atom_site.id.

The individual elements of the anisotropic displacement matrix are labelled slightly differently in the mmCIF dictionary than in the core CIF dictionary in order to emphasize their matrix character. However, the definitions of the corresponding data items are identical in the two dictionaries.








































to end of page
to top of page