International
Tables for Crystallography Volume G Definition and exchange of crystallographic data Edited by S. R. Hall and B. McMahon © International Union of Crystallography 2006 |
International Tables for Crystallography (2006). Vol. G. ch. 2.1, pp. 14-16
Section 2.1.3. The syntax of the STAR File
a
School of Biomedical and Chemical Sciences, University of Western Australia, Crawley, Perth, WA 6009, Australia, and bSchool of Computer Science and Software Engineering, University of Western Australia, 35 Stirling Highway, Crawley, Perth, WA 6009, Australia |
The syntax of the STAR File (Hall, 1991; Hall & Spadaccini, 1994) has been used to develop a number of discipline-specific exchange and archival approaches, including the Crystallographic Information File (CIF) (Hall et al., 1991), the Molecular Information File (MIF) (Allen et al., 1995), the dictionary definition language (DDL1) (Hall & Cook, 1995), the macromolecular dictionary definition language (DDL2) (Westbrook & Hall, 1995) and the STAR dictionary definition language (StarDDL) (Spadaccini et al., 2000). The details of the CIF, MIF, DDL1 and DDL2 approaches are given in Chapters 2.2 , 2.4 , 2.5 and 2.6 , respectively.
A STAR File is a sequential file containing lines of standard ASCII characters. A file may be divided into any number of discrete sets of unique data items. Sets may be in the form of data blocks, global blocks or save frames. The syntax rules for these sets are given below in descriptive form. A more rigorous description of the STAR File syntax is given in Appendix 2.1.1 in extended Backus–Naur form (McLennon, 1983).
The STAR File is a free-form language in which spaces (ASCII 32), vertical tabs (ASCII 11) and horizontal tabs (ASCII 9) are collectively referred to as 〈blank〉, and newlines (ASCII 10), form feeds (ASCII 12) and carriage returns (ASCII 13) are collectively referred to as 〈terminate〉. White spaces 〈wspace〉, used to separate lexical tokens within the file, are all characters in the joined set of 〈blank〉 and 〈terminate〉.
A text string is defined as any of the following.
A data name (or tag) is the identifier of a data value (see Section 2.1.3.3) and is a sequence of non-white-space characters starting with an underscore character 〈_〉 (ASCII 95).
A data value is a text string preceded by its identifying data name. Privileged keywords, such as described in Sections 2.1.3.5 to 2.1.3.8, are excluded from this definition.
A data item is a data value and its associated data name. Each data item stored in a STAR File is specified with this combination.
A looped list consists of the keyword loop_ followed by
A looped list specifies a table of data in which the data names represent the `header descriptors' for columns of data and the packets represent the rows in the table. Looped lists may be nested to any level. Each loop level is initialized with the loop_ keyword and is followed by the names of data items in this level. Data values that follow the nested data declarations must be in exact multiples of the number of data names. Each loop level must be terminated with a stop_, except the outermost (level 1) which is terminated by either a new data item or the privileged strings indicating a save frame (Section 2.1.3.6), a data block (Section 2.1.3.7), a global block (Section 2.1.3.8) or an end of file.
An example of a simple one-level loop structure is:
Nested (multi-level) looped lists contain matching data packets [as per (b) above] and an additional stop_ to terminate each level of data. Here is a simple example of a two-level nested list.
The matching of data names to value packets is applied at each loop level. Initially the data values are matched to the data names listed in the outermost level loop. This process is iterated to successively inner levels. At the innermost loop level, data matching is maintained until a stop_ is encountered. This returns the matching process to the next outer level. The matching process is recursive until the loop structure is depleted. Here is an example of a three-level loop structure.
A save frame is a set of unique data items wholly contained within a data block. The frame starts with a save_framecode statement, where the framecode is a unique identifying code within the data block. Each frame is closed with a save_ statement.
A save frame has the following attributes:
A data block is a set of data containing any number of unique items and save frames. A data block begins with a data_blockcode statement, where blockcode is a unique identifying name within a file. A data block is closed by another data_blockcode statement, a global_ statement or an end of file.
all information relevant to rhinovirus included here
all information relevant to influenza virus
included here
A data block has the following attributes:
A global block is a set of data items which are implied to be present in all data blocks which follow in a file, unless specified explicitly within a data block. A global block starts with a global_ keyword and is closed by a data_blockcode statement or an end of file.
information that is default within subsequent
data blocks
A global block has the following attributes:
A data set is the generic term for a unique set of data. A STAR File may contain three types of data sets: global blocks, data blocks and save frames. The attributes of data sets are as follows.
The following constructs are privileged.
.
In Section 2.1.3.5 we discuss how stop_ is used to terminate a loop of data values and to return the looped list to the next outer nesting level. This same construction applies in the looped list of data names. The following, although not particularly intuitive, is a valid construction.
This is equivalent to the loop definition given in Section 2.1.3.5. One can use the stop_ in name definitions to inhibit the nesting of loops in the definitions.
References
Allen, F. H., Barnard, J. M., Cook, A. P. F. & Hall, S. R. (1995). The Molecular Information File (MIF): core specifications of a new standard format for chemical data. J. Chem. Inf. Comput. Sci. 35, 412–427.Google ScholarHall, S. R. (1991). The STAR File: a new format for electronic data transfer and archiving. J. Chem. Inf. Comput. Sci. 31, 326–333.Google Scholar
Hall, S. R., Allen, F. H. & Brown, I. D. (1991). The Crystallographic Information File (CIF): a new standard archive file for crystallography. Acta Cryst. A47, 655–685.Google Scholar
Hall, S. R. & Cook, A. P. F. (1995). STAR dictionary definition language: initial specification. J. Chem. Inf. Comput. Sci. 35, 819–825.Google Scholar
Hall, S. R. & Spadaccini, N. (1994). The STAR File: detailed specifications. J. Chem. Inf. Comput. Sci. 34, 505–508.Google Scholar
McLennon, B. J. (1983). Principles of programming languages, design, evaluation and implementation. New York: Holt, Rinehart and Winston.Google Scholar
Spadaccini, N., Hall, S. R. & Castleden, I. R. (2000). Relational expressions in STAR File dictionaries. J. Chem. Inf. Comput. Sci. 40, 1289–1301.Google Scholar
Westbrook, J. D. & Hall, S. R. (1995). A Dictionary Description Language for Macromolecular Structure. http://ndbserver.rutgers.edu/mmcif/ddl .Google Scholar