Tables for Crystallography Volume G Definition and exchange of crystallographic data Edited by S. R. Hall and B. McMahon © International Union of Crystallography 2006 |
International Tables for Crystallography (2006). Vol. G. ch. 3.6, pp. 195-196
The majority of crystallographic and structural concepts embodied in the PDB are already well described in the mmCIF data dictionary. However, while there is a conceptual description of most crystallographic information in PDB-format files within the mmCIF dictionary, the precise representation of this information can differ subtly. To guarantee accurate data exchange and to facilitate reversible format translation between PDB and mmCIF formats, all such differences in representation must be resolved.
To accommodate content and semantic differences between formats, extensions to the dictionary have been created. These extensions take one of two forms: the addition of new definitions to existing categories or the creation of new categories. Where possible, extensions are added to existing categories. This is done when the new definition supplements the content of the category without changing the category definition or its fundamental organization. However, if a new definition cannot be added to an existing category, a new category is created to hold the extension. All new data items and categories include the prefix pdbx_ in their names.
For example, the level of detail in the PDB description of the biological source exceeds the description provided by mmCIF. In this case, dictionary extensions have been added to the existing categories ENTITY_SRC_NAT and ENTITY_SRC_GEN (where `nat' and `gen' stand for naturally occurring and genetically engineered, respectively). The PDB description of atomic coordinates includes two items that are not described in mmCIF: the insertion code and the model number. These have been added to the mmCIF category ATOM_SITE (as _atom_site.pdbx_PDB_ins_code and _atom_site.pdbx_PDB_model_num) and to all related categories that include atom nomenclature.
The convention for defining the hydrogen bonding in β-sheets differs between the PDB and mmCIF representations. Because the PDB model is fundamentally different from that found in mmCIF, a new category was created to hold the PDB data: PDBX_STRUCT_SHEET_HBOND. The correspondence between the PDB and mmCIF formats is tabulated at .