International
Tables for Crystallography Volume G Definition and exchange of crystallographic data Edited by S. R. Hall and B. McMahon © International Union of Crystallography 2006 |
International Tables for Crystallography (2006). Vol. G. ch. 2.1, pp. 17-19
Section A2.1.1.2. STAR grammar
a
School of Biomedical and Chemical Sciences, University of Western Australia, Crawley, Perth, WA 6009, Australia, and bSchool of Computer Science and Software Engineering, University of Western Australia, 35 Stirling Highway, Crawley, Perth, WA 6009, Australia |
A STAR File may be an empty file, or it may contain one or more data blocks or global blocks.
There can be any amount of white spaces (remember 〈wspace〉 includes comments) before and at least one white space or an end of file (EOF) after a data or global block. This forces white space between data (and global) blocks in a single file. There must be at least one data item in any data or global block. This means a file consisting of just a data or global block heading is invalid.
There can be any amount of white spaces (remember 〈wspace〉 includes comments) before a save-frame block. This forces white space between save-frame blocks also. There is no need to include the { 〈wspace〉+ | 〈EOF〉 } found in data and global blocks, since those productions cover the situation of a save-frame block terminating the file.
A data-block or save-frame heading consists of the relevant five-character keyword (case-insensitive) immediately followed by at least one non-blank character. This does not preclude the associated block name or frame name consisting of just one or more punctuation characters.
Data come in the following three forms.
We must allow for white space preceding the loop_ (case-insensitive) keyword, since this is not covered by any of the other productions.
The name list for a loop must include at least one data name or a nested loop.
A data name is initiated by an underscore character and followed by one or more non-blank and non-terminating characters from the STAR character set. This does not preclude data names consisting of just one or more punctuation characters.
Loop values are represented in the same way as the 〈data〉 production, except that the possibility of nested data loops introduces the need for the stop_ keyword.
Data values of type I data are immediately preceded by a 〈blank〉. Data values of type II data are immediately preceded by a 〈terminate〉.
A type-I unquoted string is immediately preceded by a 〈blank〉. It cannot begin with a number of characters (the complement of the 〈ordinary_char〉 set) i.e. ", #, $, ', [, ] and _. However, it can begin with a semicolon. Then it is followed by any number of non-blank characters.
A type-II unquoted string is immediately preceded by a line break. As with type I, it too cannot begin with a ", #, $, ', [, ] or _. It also cannot begin with a semicolon, since this would match the semicolon-delimited data production.
Specific exceptions to lexemes which match both types of unquoted strings are:
If one wishes to define data values which match lexemes excluded in cases (1) and (2) above, they should be quoted data values.
The string between a set of double quotes can consist of any character that is not a double quote, or it can be a double quote as long as it is immediately followed by a non-blank character or any number of double quotes at the end of the string. This final rule picks up cases of double-quote delimited strings that end in one or more double quotes, like "ABC"".
The string between a set of single quotes can consist of any character that is not a single quote, or it can be a single quote as long as it is immediately followed by a non-blank character or any number of single quotes at the end of the string. This final rule picks up cases of single-quote delimited strings that end in one or more single quotes, like 'ABC''.
The string bounded by semicolons can begin with any number of characters (including those in the 〈blank〉 production) but is necessarily terminated by a line break. This forces a line break on the line that contains the `opening' semicolon. After the first line, one can have any number of 〈line_of_text〉. Note we treat the first line as special, since it can contain a leading semicolon, which is not true of 〈line_of_text〉. A 〈line_of_text〉 is always terminated with a line break, thus ensuring the closing semicolon is in column 1.
The string bounded by square brackets can consist of any character including 〈terminate〉 and 〈blank〉, and excluding the characters [ and ] unless they are escaped or are balanced.