STAR grammar

Hall, S. R.; Spadaccini, N.

doi:10.1107/97809553602060000727

International
Tables for
Crystallography
Volume G
Definition and exchange of crystallographic data
Edited by S. R. Hall and B. McMahon

pdf | chapter contents | chapter index | related articles

International Tables for Crystallography (2006). Vol. G. ch. 2.1, pp. 17-19

Section A2.1.1.2. STAR grammar

S. R. Hall^a ^* and N. Spadaccini^b

^a School of Biomedical and Chemical Sciences, University of Western Australia, Crawley, Perth, WA 6009, Australia, and ^bSchool of Computer Science and Software Engineering, University of Western Australia, 35 Stirling Highway, Crawley, Perth, WA 6009, Australia
Correspondence e-mail: syd@crystal.uwa.edu.au

A2.1.1.2. STAR grammar

| top | pdf |

A STAR File may be an empty file, or it may contain one or more data blocks or global blocks. [Scheme scheme31]

There can be any amount of white spaces (remember 〈wspace〉 includes comments) before and at least one white space or an end of file (EOF) after a data or global block. This forces white space between data (and global) blocks in a single file. There must be at least one data item in any data or global block. This means a file consisting of just a data or global block heading is invalid. [Scheme scheme32]

There can be any amount of white spaces (remember 〈wspace〉 includes comments) before a save-frame block. This forces white space between save-frame blocks also. There is no need to include the { 〈wspace〉+ | 〈EOF〉 } found in data and global blocks, since those productions cover the situation of a save-frame block terminating the file. [Scheme scheme33]

A data-block or save-frame heading consists of the relevant five-character keyword (case-insensitive) immediately followed by at least one non-blank character. This does not preclude the associated block name or frame name consisting of just one or more punctuation characters. [Scheme scheme34]

Data come in the following three forms.

(1) A data-name tag separated from its associated value by a trailing 〈blank〉. Note it is explicitly a 〈blank〉 and not a 〈wspace〉. These are type I data.
(2) A data-name tag separated from its associated value by a 〈terminate〉. These are type II data.
(3) Looped data.

[Scheme scheme35]

We must allow for white space preceding the loop_ (case-insensitive) keyword, since this is not covered by any of the other productions. [Scheme scheme36]

The name list for a loop must include at least one data name or a nested loop. [Scheme scheme37]

A data name is initiated by an underscore character and followed by one or more non-blank and non-terminating characters from the STAR character set. This does not preclude data names consisting of just one or more punctuation characters. [Scheme scheme38]

Loop values are represented in the same way as the 〈data〉 production, except that the possibility of nested data loops introduces the need for the stop_ keyword. [Scheme scheme39]

Data values of type I data are immediately preceded by a 〈blank〉. Data values of type II data are immediately preceded by a 〈terminate〉. [Scheme scheme40]

A type-I unquoted string is immediately preceded by a 〈blank〉. It cannot begin with a number of characters (the complement of the 〈ordinary_char〉 set) i.e. ", #, $, ', [, ] and _. However, it can begin with a semicolon. Then it is followed by any number of non-blank characters. [Scheme scheme41]

A type-II unquoted string is immediately preceded by a line break. As with type I, it too cannot begin with a ", #, $, ', [, ] or _. It also cannot begin with a semicolon, since this would match the semicolon-delimited data production. [Scheme scheme42]

Specific exceptions to lexemes which match both types of unquoted strings are:

(1) No string beginning with an underscore is an unquoted string.
(2) No string that matches a production for 〈data_heading〉, 〈save_heading〉, 〈LOOP_〉, 〈STOP_〉, 〈SAVE_〉 or 〈GLOBAL_〉 is an unquoted string.

If one wishes to define data values which match lexemes excluded in cases (1) and (2) above, they should be quoted data values.

The string between a set of double quotes can consist of any character that is not a double quote, or it can be a double quote as long as it is immediately followed by a non-blank character or any number of double quotes at the end of the string. This final rule picks up cases of double-quote delimited strings that end in one or more double quotes, like "ABC"". [Scheme scheme43]

The string between a set of single quotes can consist of any character that is not a single quote, or it can be a single quote as long as it is immediately followed by a non-blank character or any number of single quotes at the end of the string. This final rule picks up cases of single-quote delimited strings that end in one or more single quotes, like 'ABC''. [Scheme scheme44]

The string bounded by semicolons can begin with any number of characters (including those in the 〈blank〉 production) but is necessarily terminated by a line break. This forces a line break on the line that contains the `opening' semicolon. After the first line, one can have any number of 〈line_of_text〉. Note we treat the first line as special, since it can contain a leading semicolon, which is not true of 〈line_of_text〉. A 〈line_of_text〉 is always terminated with a line break, thus ensuring the closing semicolon is in column 1. [Scheme scheme45]

The string bounded by square brackets can consist of any character including 〈terminate〉 and 〈blank〉, and excluding the characters [ and ] unless they are escaped or are balanced. [Scheme scheme46]

References

International Tables for Crystallography (2006). Vol. G. ch. 2.1, pp. 17-19