International
Tables for
Crystallography
Volume G
Definition and exchange of crystallographic data
Edited by S. R. Hall and B. McMahon

International Tables for Crystallography (2006). Vol. G. ch. 2.1, pp. 17-19

Section A2.1.1.2. STAR grammar

S. R. Halla* and N. Spadaccinib

a School of Biomedical and Chemical Sciences, University of Western Australia, Crawley, Perth, WA 6009, Australia, and bSchool of Computer Science and Software Engineering, University of Western Australia, 35 Stirling Highway, Crawley, Perth, WA 6009, Australia
Correspondence e-mail:  syd@crystal.uwa.edu.au

A2.1.1.2. STAR grammar

| top | pdf |

A STAR File may be an empty file, or it may contain one or more data blocks or global blocks.[Scheme scheme31]

There can be any amount of white spaces (remember 〈wspace〉 includes comments) before and at least one white space or an end of file (EOF) after a data or global block. This forces white space between data (and global) blocks in a single file. There must be at least one data item in any data or global block. This means a file consisting of just a data or global block heading is invalid.[Scheme scheme32]

There can be any amount of white spaces (remember 〈wspace〉 includes comments) before a save-frame block. This forces white space between save-frame blocks also. There is no need to include the { 〈wspace〉+ | 〈EOF〉 } found in data and global blocks, since those productions cover the situation of a save-frame block terminating the file.[Scheme scheme33]

A data-block or save-frame heading consists of the relevant five-character keyword (case-insensitive) immediately followed by at least one non-blank character. This does not preclude the associated block name or frame name consisting of just one or more punctuation characters.[Scheme scheme34]

Data come in the following three forms.

  • (1) A data-name tag separated from its associated value by a trailing 〈blank〉. Note it is explicitly a 〈blank〉 and not a 〈wspace〉. These are type I data.

  • (2) A data-name tag separated from its associated value by a 〈terminate〉. These are type II data.

  • (3) Looped data.

[Scheme scheme35]

We must allow for white space preceding the loop_ (case-insensitive) keyword, since this is not covered by any of the other productions.[Scheme scheme36]

The name list for a loop must include at least one data name or a nested loop.[Scheme scheme37]

A data name is initiated by an underscore character and followed by one or more non-blank and non-terminating characters from the STAR character set. This does not preclude data names consisting of just one or more punctuation characters.[Scheme scheme38]

Loop values are represented in the same way as the 〈data〉 production, except that the possibility of nested data loops introduces the need for the stop_ keyword.[Scheme scheme39]

Data values of type I data are immediately preceded by a 〈blank〉. Data values of type II data are immediately preceded by a 〈terminate〉.[Scheme scheme40]

A type-I unquoted string is immediately preceded by a 〈blank〉. It cannot begin with a number of characters (the complement of the 〈ordinary_char〉 set) i.e. ", #, $, ', [, ] and _. However, it can begin with a semicolon. Then it is followed by any number of non-blank characters.[Scheme scheme41]

A type-II unquoted string is immediately preceded by a line break. As with type I, it too cannot begin with a ", #, $, ', [, ] or _. It also cannot begin with a semicolon, since this would match the semicolon-delimited data production.[Scheme scheme42]

Specific exceptions to lexemes which match both types of unquoted strings are:

  • (1) No string beginning with an underscore is an unquoted string.

  • (2) No string that matches a production for 〈data_heading〉, 〈save_heading〉, 〈LOOP_〉, 〈STOP_〉, 〈SAVE_〉 or 〈GLOBAL_〉 is an unquoted string.

If one wishes to define data values which match lexemes excluded in cases (1[link]) and (2[link]) above, they should be quoted data values.

The string between a set of double quotes can consist of any character that is not a double quote, or it can be a double quote as long as it is immediately followed by a non-blank character or any number of double quotes at the end of the string. This final rule picks up cases of double-quote delimited strings that end in one or more double quotes, like "ABC"".[Scheme scheme43]

The string between a set of single quotes can consist of any character that is not a single quote, or it can be a single quote as long as it is immediately followed by a non-blank character or any number of single quotes at the end of the string. This final rule picks up cases of single-quote delimited strings that end in one or more single quotes, like 'ABC''.[Scheme scheme44]

The string bounded by semicolons can begin with any number of characters (including those in the 〈blank〉 production) but is necessarily terminated by a line break. This forces a line break on the line that contains the `opening' semicolon. After the first line, one can have any number of 〈line_of_text〉. Note we treat the first line as special, since it can contain a leading semicolon, which is not true of 〈line_of_text〉. A 〈line_of_text〉 is always terminated with a line break, thus ensuring the closing semicolon is in column 1.[Scheme scheme45]

The string bounded by square brackets can consist of any character including 〈terminate〉 and 〈blank〉, and excluding the characters [ and ] unless they are escaped or are balanced.[Scheme scheme46]








































to end of page
to top of page