International
Tables for
Crystallography
Volume G
Definition and exchange of crystallographic data
Edited by S. R. Hall and B. McMahon

International Tables for Crystallography (2006). Vol. G. ch. 2.6, pp. 63-66

Section 2.6.5. DDL2 dictionary applications

J. D. Westbrook,a* H. M. Bermana and S. R. Hallb

a Protein Data Bank, Research Collaboratory for Structural Bioinformatics, Rutgers, The State University of New Jersey, Department of Chemistry and Chemical Biology, 610 Taylor Road, Piscataway, NJ 08854-8087, USA, and bSchool of Biomedical and Chemical Sciences, University of Western Australia, Crawley, Perth, WA 6009, Australia
Correspondence e-mail:  jwest@rcsb.rutgers.edu

2.6.5. DDL2 dictionary applications

| top | pdf |

In this section, several examples are presented which illustrate how the elements of the DDL are used to build dictionary definitions. Example 2.6.5.1[link] shows the definition of the _citation.journal_abbrev data item from the mmCIF dictionary.

The category ITEM_DESCRIPTION holds a text description of each data item. The category ITEM holds the item name, category name and a code indicating whether this item is mandatory in any row of this category. The value of the mandatory code is either yes, no or implicit. The implicit value is used to indicate that a value is required for the item but it can be derived from the context of the definition and need not be specified. This feature is most often used in DDL2 dictionaries to avoid re-specifying data-item names in each category since these values can be derived from the name of the save frame enclosing the definition. The value of the _item.name in the above example is enclosed in quotation marks. This is a requirement of the STAR syntax so that a value containing a data name is not mistaken for a dictionary attribute.

The mmCIF dictionary contains a superset of the definitions that were originally defined in the core CIF dictionary. In order to maintain backward compatibility with original definitions, the ITEM_ALIASES category was introduced to hold the item name, dictionary name and version in which the original definition of an item was published. In this example, the data name used in the core dictionary differs from the example definition only in the period that distinguishes the category and attribute portions of the item name.

The category ITEM_TYPE holds a reference to a data type specified in the ITEM_TYPE_LIST category. A reference to the data type is used here rather than a detailed data-type description in order to avoid repeating the description for other data items. A single list of data types and associated regular expressions is stored in the ITEM_TYPE_LIST category and this may be referenced by all of the definitions in the dictionary. In the mmCIF dictionary, the codes that are used to describe the data types are generally easy to interpret. In this example, the type code `line' indicates that a single line of text will be accepted for this data item.

Descriptive examples of data items can be included in the ITEM_EXAMPLES category. In Example 2.6.5.1[link], one value, 'J. Mol. Biol.', is specified, but multiple examples can be provided using a loop_ directive.

Example 2.6.5.1. A rather simple data definition.

[Scheme scheme2]

Other DDL item attributes are illustrated in the mmCIF definitions for the items _cell.length_a and _cell.length_a_esd in Example 2.6.5.2[link]. Some data items are only meaningful as part of a complete set. The ITEM_DEPENDENT category is used to store this type of information. Those additional data items within the irreducible set are listed in this category. In Example 2.6.5.2[link], the cell lengths in the b and c directions are defined as dependent items of the cell length in the a direction.

Example 2.6.5.2. Definition of a data item that has dependencies and associated items.

[Scheme scheme3]

The permissible ranges of values for a numerical data item are stored in the ITEM_RANGE category. Each boundary condition is defined as the non-inclusive range between a pair of minimum and maximum values. If multiple boundary conditions are specified using the loop_ directive, then each condition must be satisfied. A discrete boundary value may be set by assigning the desired boundary value as both the maximum and minimum value. In the above example, the permissible cell-length range is defined as greater than or equal to zero, where the latter boundary condition is specified by setting both extrema as zero.

A number of special relationships may be defined between data items. For some relationships which occur frequently, the source or function of the relationship has been standardized. In the example above, this feature is used to identify that the _cell.length_a_esd is the standard uncertainty (estimated standard deviation) of _cell.length_a. The recognized relationships are fully described in the DDL definition of the data item _item_related.function_code in category ITEM_RELATED. The current list includes the kinds of relationships in Table 2.6.5.1[link].

Table 2.6.5.1 | top | pdf |
Relationships defined by _item_related.function_code

Code Meaning
alternate The item identified in _item_related.related_name is an alternative expression in terms of its application and attributes to the item in this definition
alternate_exclusive The item identified in _item_related.related_name is an alternative expression in terms of its application and attributes to the item in this definition; only one of the alternative forms may be specified
convention The item identified in _item_related.related_name differs from the defined item only in terms of a convention in its expression
conversion_constant The item identified in _item_related.related_name differs from the defined item only by a known constant
conversion_arbitrary The item identified in _item_related.related_name differs from the defined item only by an arbitrary constant
replaces The defined item replaces the item identified in _item_related.related_name
replacedby The defined item is replaced by the item identified in _item_related.related_name
associated_value The item identified in _item_related.related_name is meaningful when associated with the defined item
associated_esd The item identified in _item_related.related_name is the standard uncertainty (estimated standard deviation) of the defined item

Sets of data items within a category may be collected into named subcategories. ITEM_SUB_CATEGORY is used to store the subcategory membership of a data item. In the above example, item _cell.length_a is added to the subcategory CELL_LENGTH. The items _cell.length_b and _cell.length_c are similarly added to this subcategory in their definitions.

The ITEM_UNITS category holds the name of the system of units in which an item is expressed. The name assigned to _item_units.code refers to a single list of all of the unit types used in the dictionary. This list is stored in the category ITEM_UNITS_LIST. Conversion factors between different systems of units are provided in the data table stored in the ITEM_UNITS_CONVERSION category.

Example 2.6.5.3[link] shows the definition of the CELL category from the mmCIF dictionary. The name and textual description of a category are stored in the category named CATEGORY. The item named _category.mandatory_code indicates whether the category must appear in any data block based on this dictionary.

Example 2.6.5.3. Definition of an mmCIF category.

[Scheme scheme4]

The list of data items that uniquely identify each row of a category are stored in the CATEGORY_KEY category. In the example above, the item _cell.entry_id is defined as the category key. This item is a reference to the top-level identifier in the mmCIF dictionary, _entry.id. Because only a single entry may exist within an mmCIF data block, this choice of category key specifies that only a single row may exist in the CELL category.

Membership in category groups is stored in the category named CATEGORY_GROUP. Each category group must have a corresponding definition in the category CATEGORY_GROUP_LIST. In the above example, the CELL category is assigned to category groups cell_group and inclusive_group. The former contains categories that describe properties of the crystallographic cell and the latter includes all the categories in the mmCIF dictionary. Organizing categories in category groups is a convenient means of providing a high-level organizational structure for a complex dictionary.

Complete and annotated examples of a category are stored in the CATEGORY_EXAMPLES category. The text of the category example is stored in the item _category_examples.case and any associated annotation is stored in the item _category_examples.detail.

Example 2.6.5.4[link] illustrates the definition of a pair of mmCIF categories, CITATION and CITATION_AUTHOR, which share a common data item, _citation.id. This example illustrates how an item that occurs in multiple categories may be defined. In the case of the citation identifier, _citation.id, the ITEM category is preceded by a loop_ directive and within this loop all of the definitions of the citation identifier are listed. For instance, the citation identifier is also an item in category CITATION_AUTHOR, where it has the item name _citation_author.citation_id. For conformity with the manner in which the core CIF dictionary has been organized, a skeleton definition of the child data item _citation_author.citation_id has been included in the dictionary. In fact, this skeleton definition is formally unnecessary.

Example 2.6.5.4. Related categories linked by parent–child relationships

[Scheme scheme5]

As a matter of style, the mmCIF dictionary generally defines all of the instances of a data item within the parent definition. Items that are related to the parent definition are also listed in the ITEM_LINKED category. The repetition of a data item in multiple categories gives rise to parent–child relationships between such definitions. These relationships are stored in the ITEM_LINKED category. In Example 2.6.5.4[link], this category stores the list of data items that are children of the citation identifier _citation.id. These include _citation_author.citation_id, _citation_editor.citation_id and _software.citation_id.








































to end of page
to top of page