International
Tables for Crystallography Volume G Definition and exchange of crystallographic data Edited by S. R. Hall and B. McMahon © International Union of Crystallography 2006 |
International Tables for Crystallography (2006). Vol. G. ch. 2.4, pp. 45-47
Section 2.4.4. MIF concepts and syntax
a
Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge, CB2 1EZ, England,bBCI Ltd, 46 Uppergate Road, Stannington, Sheffield S6 6BX, England, and cSchool of Biomedical and Chemical Sciences, University of Western Australia, Crawley, Perth, WA 6009, Australia |
The syntax of the Molecular Information File is based on that of the STAR File (Hall, 1991; Hall & Spadaccini, 1994). A MIF is an ASCII text file that can be read or amended using a standard text editor, and that can be processed computationally without conversion to another format. The organization and expression of MIF data is summarized in Table 2.4.4.1. Each file consists of a series of data blocks and each block consists of a series of individual data items. There may be any number of items within a block and any number of blocks within a file. A data block represents a logical grouping of data items and, in most MIF applications, a data block will usually specify a complete chemical entity, i.e. a fully defined molecule or a query substructure.
|
The MIF syntax, unlike that of a CIF, places no restrictions on line lengths or nested loop levels. For a detailed understanding of the differences between a MIF and a CIF, the reader should compare this chapter with Chapter 2.2 or refer to the published details of the STAR syntax (Hall & Spadaccini, 1994), the specification of the CIF core data items (Hall et al., 1991) and the Dictionary Definition Language (Hall & Cook, 1995) used to define data items in the electronic version of a STAR dictionary.
CIF data, described by over a thousand items in the current dictionaries (see Part 4), encompass the fields of crystallographic structure and diffraction techniques, and these data items could readily be incorporated into a MIF. It should be noted, however, that currently the reverse is not possible because the current CIF syntax does not support nested loops or save frames.
The fundamental principle that underpins MIF is exactly as for CIF: every data item is represented by a unique data tag followed by its associated data value. These combinations are referred to as tag–value pairs or tuples. Data names must start with an underscore (i.e. underline) character and data values may be any type of string, ranging from a single character to many lines of text. Here are some simple examples of MIF data items:
The complete list of MIF core data items is given in Chapter 4.8 .
Repetitive data are stored in a MIF as lists of values, as they are in a CIF. Each list is prefaced by a loop_ statement and a sequence of data names that identify the data values that follow in `packets' of equal length. The values in each packet match the order and number of the data names. Any number of packets may appear in a looped list.
Atom and bond properties are typical of the information to appear in a looped list. The atoms and bonds of thiabutyrolactone in MIF format are shown in Fig. 2.4.4.1. The description of each data item in this example is given in Chapter 4.8 , although the meanings are clear from the self-descriptive data names. The number of data values in each list is an exact multiple of the number of data names at the start of each loop structure. Looped lists are terminated by the next list or by any other data name, data block or end of file. Comments may be included in a MIF and are preceded by a # character, as illustrated in Fig. 2.4.4.1.
Hierarchical data may require the use of nested loop structures (see the _display_* loop in Fig. 2.4.4.2). Note that the packet for _display_id of 7 has two sets of _display_conn_ values giving connections to atom sites 1 and 4 (the other connections to site 7 appear in the next two packets). Data items that appear in looped lists are identified in the MIF dictionary (see Chapter 4.8 ) as having the attribute _list set to either `yes' or `both'. Other relationships between looped data items are also specified in the dictionary.
Save frames are employed in a MIF to encapsulate grouped data for efficient cross-referencing. If a set of data needs to appear repeatedly in a data application, it is efficient to place this data into an addressable save frame. Molecular fragments, such as amino-acid units, are a case in point. A save frame is bounded by the statement save_framecode and terminated by a save_ statement. It can be referenced within the parent data block using the value $framecode where the framecode matches the string in the save_framecode. Note that all data names must be unique within a save frame, but the same data names may appear in other save frames or in the parent data block. Save frames may not contain other save frames but save-frame references ($framecode) may appear in other save frames.
Save frames can be used in a MIF for many purposes. A simple application, the storage of alternative 3D conformational representations describing cyclohexane, is illustrated in Fig. 2.4.4.3. Within the STAR syntax, save-frame references ($framecode) may occur before or after the save-frame definition within any data block. MIF preserves this basic STAR syntax. Save frames are particularly useful for defining commonly referenced structural templates and examples of this facility are discussed and illustrated (Figs. 2.4.7.1 and 2.4.7.2) in Section 2.4.7.
A data block is a sequence of unique data items or save frames. It is opened with a data_blockcode statement and closed by another data-block statement or a global_ statement (see below). The blockcode string identifies the block within the file. Examples of data blocks are shown in Figs. 2.4.4.2, 2.4.4.3 and 2.4.6.1. Each data block in a file must have a unique blockcode.
A global block is similar to a data block except that it is opened with a global_ statement and contains data that are common or `default' to all subsequent data blocks in a file. Global data items remain active until re-specified in a subsequent data block or global block.
In some applications it may be efficient to place data that are common to all data blocks within a global block. In particular, save frames may be defined within global blocks and then referenced in subsequent data blocks [this statement corrects an error in Hall & Spadaccini (1994)]. Examples of global data are shown in Figs. 2.4.7.1 and 2.4.7.2, in which a variety of frequently referenced structural units are encapsulated within save frames specified in global blocks.
References
Hall, S. R. (1991). The STAR File: a new format for electronic data transfer and archiving. J. Chem. Inf. Comput. Sci. 31, 326–333.Google ScholarHall, S. R., Allen, F. H. & Brown, I. D. (1991). The Crystallographic Information File (CIF): a new standard archive file for crystallography. Acta Cryst. A47, 655–685.Google Scholar
Hall, S. R. & Cook, A. P. F. (1995). STAR Dictionary Definition Language: initial specification. J. Chem. Inf. Comput. Sci. 35, 819–825.Google Scholar
Hall, S. R. & Spadaccini, N. (1994). The STAR File: detailed specifications. J. Chem. Inf. Comput. Sci. 34, 505–508.Google Scholar