International
Tables for Crystallography Volume G Definition and exchange of crystallographic data Edited by S. R. Hall and B. McMahon © International Union of Crystallography 2006 |
International Tables for Crystallography (2006). Vol. G. ch. 2.5, pp. 58-59
Section 2.5.6.3. Typing attributes
a
School of Biomedical and Chemical Sciences, University of Western Australia, Crawley, Perth, WA 6009, Australia, and bBCI Ltd, 46 Uppergate Road, Stannington, Sheffield S6 6BX, England |
This class of attributes is used to specify the fundamental characteristics of data items and how they may be instantiated. These attributes are machine-parsable and of particular importance in the validation of CIF data. They are Enumeration attributes are used to specify restricted values, or `states', of a data item (see the example in Fig. 2.5.5.9). They are not applicable in a definition of an item with unrestricted values. The attributes _enumeration and _enumeration_detail are used in definitions to specify permitted states and their descriptions. For instance, in a definition of atomic element symbols these attributes would be used to list the IUPAC `element symbols' and `element names'. When a data item is restricted to an ordinal set of values, the attribute _enumeration_range is used to define the minimum and maximum values of the logical sequence in the format <min>:<max> The attribute _enumeration_default may be used to specify the default value if an item is not present in a data instantiation.
List attributes specify how, and when, data items are used in a (looped) list. The attribute _list has a value of yes if a defined item must be in a list, and no if it must not. A value of both allows for both uses. The attribute _list_level specifies the nesting level of a list defined item. Only level 1 data are allowed in a CIF (see Chapter 2.2 ). Higher-nested list levels are permitted for MIF data (Allen et al., 1995; see Chapter 2.4 for a description of the MIF format). Fig. 2.5.6.1 shows MIF data specifying a 2D chemical molecular diagram. The first four items in the category DISPLAY are in a level 1 list and the next two items in the category DISPLAY_CONN are in a level 2 list. Note that, according to the STAR File syntax, the nest level is automatically incremented after the first four values, and only decremented when a stop_ signal is encountered.
Additional _list attributes for specifying relational dependencies between data items are described in the next attribute group.
Type attributes are used to specify the form of data. The attribute _type is restricted to the states numb, char and null that code for a number, a text string and a dictionary descriptor, respectively. The implications of this typing, in terms of how numbers or character strings are interpreted, is not specified by the CIF syntax. That is, a `number' is simply a string of characters having one of several possible representations (e.g. as an integer, a floating point or in scientific notation) and it is up to the parsing software to interpret these strings appropriately. The same applies in the treatment of text strings defined as `character'. Such strings may be bounded by white space, quotation characters or, in the case of multi-line text, by semicolon characters in column 1 of a line.
The attribute _type_conditions is used to identify application-specific conditions that apply to the typing of data items. This attribute is used to alert parsing software of string constructions that may disrupt the normal identification and validation of data types. The permitted attribute states are none, esd (and its preferred synonym su) and seq. In the definition example in Fig. 2.5.5.3 the code esd signals that cell-length values are measurements and may have a standard uncertainty value appended in parentheses. Fig 2.5.5.8 indicates the use of the code su for the same purpose. A full description of _type_conditions states is given in the DDL1 dictionary in Chapter 4.9 .
If a data item consists of a concatenation of values, the attribute _type_construct may be used to specify the encoding algorithm for the component values. The language used for this algorithm is POSIX regex (IEEE, 1991). The specification of _type_construct can include the data names of other defined items. Validation software interprets this attribute as format specifications and constraints, and expands the embedded data items according to their separate definition. For example, the chronological date is a composite of day, month and year. These may be represented in many different ways and the _type_construct attribute enables a specific date representation to be defined. The two definitions shown in Fig. 2.5.6.2 illustrate how a date value of the construction `1995/03/25' may be embedded into the dictionary definitions (the month and day definitions have been omitted here for brevity).
Units attributes specify the measurement units permitted for a numerical data item. The attribute _units specifies a code that uniquely identifies the measurement units. A description of the units identified by the code is given by the attribute _units_detail. Typical uses of this attribute are shown in Fig. 2.5.5.8.
References
IEEE (1991). IEEE Standard for Information Technology – Portable Operating System Interface (POSIX) – Part 2: Shell and utilities, Vol. 1, IEEE Standard 1003.2–1992. New York: The Institute of Electrical and Electronic Engineers, Inc.Google ScholarAllen, F. H., Barnard, J. M., Cook, A. F. P. & Hall, S. R. (1995). The Molecular Information File (MIF): core specifications of a new standard format for chemical data. J. Chem. Inf. Comput. Sci. 35, 412–427.Google Scholar