International
Tables for
Crystallography
Volume G
Definition and exchange of crystallographic data
Edited by S. R. Hall and B. McMahon

International Tables for Crystallography (2006). Vol. G. ch. 3.6, pp. 174-176

Section 3.6.7.3. Distinct chemical species

P. M. D. Fitzgerald,a* J. D. Westbrook,b P. E. Bourne,c B. McMahon,d K. D. Watenpaughe and H. M. Bermanf

a Merck Research Laboratories, Rahway, New Jersey, USA,bProtein Data Bank, Research Collaboratory for Structural Bioinformatics, Rutgers, The State University of New Jersey, Department of Chemistry and Chemical Biology, 610 Taylor Road, Piscataway, New Jersey, USA,cResearch Collaboratory for Structural Bioinformatics, San Diego Supercomputer Center, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0537, USA,dInternational Union of Crystallography, 5 Abbey Square, Chester CH1 2HU, England,eretired; formerly Structural, Analytical and Medicinal Chemistry, Pharmacia Corporation, Kalamazoo, Michigan, USA, and fProtein Data Bank, Research Collaboratory for Structural Bioinformatics, Rutgers, The State University of New Jersey, Department of Chemistry and Chemical Biology, 610 Taylor Road, Piscataway, New Jersey, USA
Correspondence e-mail:  paula_fitzgerald@merck.com

3.6.7.3. Distinct chemical species

| top | pdf |

The categories describing distinct chemical entities are as follows:

ENTITY group
Entities (§3.6.7.3.1[link])
ENTITY
ENTITY_KEYWORDS
ENTITY_NAME_COM
ENTITY_NAME_SYS
ENTITY_SRC_GEN
ENTITY_SRC_NAT
Polymer entities (§3.6.7.3.2[link])
ENTITY_POLY
ENTITY_POLY_SEQ

The ENTITY categories of the mmCIF dictionary should be used in preference to the CHEMICAL categories of the core CIF dictionary. In a typical small-molecule structure determination, for which the core CIF dictionary was designed, the substance being studied can be thought of as a single chemical species, even if it contains distinct ions or ligands. In a macromolecular structure, it is more often the case that separate descriptions are appropriate for each of the distinct chemical species that comprise the structural complex. The ENTITY categories allow the species present and their basic chemical properties to be specified. Their structures and connectivity are described in other categories.

It is important, therefore, to remember that the ENTITY data do not represent the result of the crystallographic experiment; those results are given using the ATOM_SITE data items and are discussed and described using data items in the STRUCT family of categories. The ENTITY categories describe the chemistry of the molecules under investigation and are most usefully considered as the ideal groups to which the structure is restrained or constrained during refinement.

It is also important to remember that entities do not correspond directly to the total contents of the asymmetric unit. Entities are described only once, even in structures in which the entity occurs several times. The STRUCT_ASYM data items, which reference the list of entities, describe and label the contents of the asymmetric unit.

The following discussion treats the data items used for entities in general (Section 3.6.7.3.1[link]) and those used more specifically to describe polymeric entities (Section 3.6.7.3.2[link]) separately.

3.6.7.3.1. Description of entities

| top | pdf |

The data items in these categories are as follows:

(a) ENTITY [Scheme scheme123]

(b) ENTITY_KEYWORDS [Scheme scheme124]

(c) ENTITY_NAME_COM [Scheme scheme125]

(d) ENTITY_NAME_SYS [Scheme scheme126]

(e) ENTITY_SRC_GEN [Scheme scheme127]

(f) ENTITY_SRC_NAT [Scheme scheme128]

The bullet ([\bullet]) indicates a category key. Where multiple items within a category are marked with a bullet, they must be taken together to form a compound key. The arrow ([\rightarrow]) is a reference to a parent data item.

An entity in mmCIF is a chemically distinct molecular component of the structural complex described in the mmCIF. The three possible types of molecular entities are polymer, non-polymer and water. Note that the `water' entity is water, and only water. Any other well ordered solvent molecules or ions should be treated as non-polymer entities. The relationships between categories used to describe the features of entities are shown in Fig. 3.6.7.5[link], which also shows how the information describing the entity is linked to the coordinate list in the ATOM_SITE category.

[Figure 3.6.7.5]

Figure 3.6.7.5 | top | pdf |

The family of categories used to describe chemical entities. Boxes surround categories of related data items. Data items that serve as category keys are preceded by a bullet ([\bullet]). Lines show relationships between linked data items in different categories with arrows pointing at the parent data item.

Data items in the ENTITY category are used to label each distinct chemical molecule with a reference code ( _entity.id), to give the formula weight in daltons (if available) and to define the type of the entity as one of polymer, non-polymer or water. The method by which the entity was produced may be indicated using the item _entity.src_method, whose allowed values are nat (indicating that the sample was isolated from a natural source), man (indicating a genetically manipulated source) or syn (indicating a chemical synthesis). A value of nat indicates that additional details should be given in the ENTITY_SRC_NAT category and a value of man indicates that additional details should be given in the ENTITY_SRC_GEN category. As these flags are only relevant to the macromolecular entities of a structural complex, a value of ` .', indicating `inapplicable', should be given to _entity.src_method for solvent or water molecules. The _entity.details field can be used for a free-text description of any special features of the entity.

Keywords characterizing the individual molecular species may be given using data items in the ENTITY_KEYWORD category. These keywords should only be used to record information that does not depend on knowledge of the molecular structure. Thus a polypeptide could be described as a polypeptide, or an enzyme, or a protease, but it should not be described as an αβ-barrel; a number of categories within the STRUCT family allow keywords specific to the structure of the macromolecule to be given.

Data items in the ENTITY_NAME_COM category may be used to give any common names for an entity. Several different names can be recorded for each entity if appropriate.

Similarly, data items in the ENTITY_NAME_SYS category may be used to give systematic names for each entity. Again, several different names can be recorded for each entity if appropriate. The data item _entity_name_sys.system can be used to record the system according to which the systematic name was generated.

The ENTITY_SRC_GEN category allows a description of the source of entities produced by genetic manipulation to be given. There are data items for describing the tissue from which the gene was obtained, the plasmid into which it was incorporated for expression, and the host organism in which the macromolecule was expressed (Example 3.6.7.6[link]).

Example 3.6.7.6. An example of the description of the entities in an HIV-1 protease structure (PDB 5HVP), described using data items in the ENTITY, ENTITY_NAME_COM, ENTITY_NAME_SYS and ENTITY_SRC_GEN categories.

[Scheme scheme129]

The ENTITY_SRC_NAT category allows a description of the source of entities obtained from a natural tissue to be given. Data items are provided for the common and systematic name (by genus, species and, where relevant, strain) of the organism from which the material was obtained. Other data items can be used to describe the tissue (and if necessary the subcellular fraction of the tissue) from which the entity was isolated.

3.6.7.3.2. Polymer entities

| top | pdf |

The data items in these categories are as follows:

(a) ENTITY_POLY [Scheme scheme130]

(b) ENTITY_POLY_SEQ [Scheme scheme131]

The bullet ([\bullet]) indicates a category key. Where multiple items within a category are marked with a bullet, they must be taken together to form a compound key. The arrow ([\rightarrow]) is a reference to a parent data item.

The polymer type, sequence length and information about any nonstandard features of the polymer may be specified using data items in the ENTITY_POLY category. The sequence of monomers in each polymer entity is given using data items in the ENTITY_POLY_SEQ category. The relationships between categories describing polymer entities are shown in Fig. 3.6.7.6[link], which also shows how the information describing the polymer is linked to the coordinate list in the ATOM_SITE category and to the full chemical description of each monomer or nonstandard monomer in the CHEM_COMP category.

[Figure 3.6.7.6]

Figure 3.6.7.6 | top | pdf |

The family of categories used to describe polymer chemical entities. Boxes surround categories of related data items. Data items that serve as category keys are preceded by a bullet ([\bullet]). Lines show relationships between linked data items in different categories with arrows pointing at the parent data items.

Non-polymer entities are treated as individual chemical components, in the same way in which monomers within a polymer are treated as individual chemical components. They may be fully described in the CHEM_COMP group of categories (Example 3.6.7.7[link]).

Example 3.6.7.7. An example of both polymer and non-polymer entities in a drug–DNA complex (NDB DDF040) described with data items in the ENTITY, ENTITY_KEYWORDS, ENTITY_NAME_COM, ENTITY_POLY and ENTITY_POLY_SEQ categories (Narayana et al., 1991[link]).

[Scheme scheme132]

Data items in the ENTITY_POLY category can be used to give the number of monomers in the polymer and to assign the type of the polymer as one of the set of types polypeptide(D), polypeptide(L), polydeoxyribonucleotide, polyribonucleotide, polysaccharide(D), polysaccharide(L) or other. Details of deviations from a standard type may be given in _entity_poly.type_details.

In some cases, the polymer is best described as one of the standard types even if it contains some nonstandard features. Flags are provided to indicate the presence of three types of nonstandard features. The presence of chiral centres other than those implied by the assigned type is indicated by assigning a value of yes to the data item _entity_poly.nstd_chirality. A value of yes for _entity_poly.nstd_linkage indicates the presence of monomer-to-monomer links different from those implied by the assigned type and a value of yes for _entity_poly.nstd_monomer indicates the presence of one or more nonstandard monomer components.

Data items in the ENTITY_POLY_SEQ category describe the sequence of monomers in a polymer. By including _entity_poly_seq.mon_id in the category key, it is possible to allow for sequence heterogeneity by allowing a given sequence number to be correlated with more than one monomer ID. Sequence heterogeneity is shown in the example of crambin in Section 3.6.3[link].

References

First citation Narayana, N., Ginell, S. L., Russu, I. M. & Berman, H. M. (1991). Crystal and molecular structure of a DNA fragment: d(CGTGAATTCACG). Biochemistry, 30, 4449–4455.Google Scholar








































to end of page
to top of page