International
Tables for
Crystallography
Volume G
Definition and exchange of crystallographic data
Edited by S. R. Hall and B. McMahon

International Tables for Crystallography (2006). Vol. G. ch. 5.3, pp. 512-514

Section 5.3.5.3.1. Basic operation of ciftex

B. McMahona*

a International Union of Crystallography, 5 Abbey Square, Chester CH1 2HU, England
Correspondence e-mail: bm@iucr.org

5.3.5.3.1. Basic operation of ciftex

| top | pdf |

The program is designed to act as a filter, typically in a Unix-style environment, reading a CIF on the standard input channel and outputting a modified data stream to standard output. The output is a file of [\hbox{\TeX}] code that is processed by the [\hbox{\TeX}] program to produce a device-independent file describing the content of a formatted typeset document. Further post-processing allows the formatted document to be viewed on the screen or printed.

Each input token (number, character or text string; data name; loop_ or data_ keywords) is transformed as it is identified; there is no lookahead and minimal retention of context. The data stream is treated purely syntactically; no transformations are applied on the basis of the supposed meaning of any of the file contents.

5.3.5.3.1.1. Non-looped data

| top | pdf |

For portions of the CIF that are not contained in looped lists, the transformations are trivial. A (data name, data value) pair is transformed to a [\hbox{\TeX}] macro and its argument. The macro name is determined from an external `map' file which the program reads at run time; this file associates CIF data names and the corresponding [\hbox{\TeX}] macros through a simple lookup table.

A CIF data value is in most cases passed as the argument to the corresponding [\hbox{\TeX}] macro with few modifications. If the data value is a character string beginning with an integer, full point, hyphen or plus character, it is assumed to be of type `numb'. A space is introduced ahead of an embedded open parenthesis (to separate a standard uncertainty from its parent value). A leading zero is printed before any bare decimal point. An embedded E is taken to indicate exponential notation and the format of the number is accordingly modified.

If the input data value is of type `char' (i.e. is a single token beginning with characters other than those recognized as the leading characters for numerical data; or contains multiple tokens delimited by quote marks or semicolons), the program will search the map file for key values exactly matching each token, and if found will substitute the token by its replacement word or text. If no replacement is specified in the map file, the token is passed unchanged to the standard output channel. This facility was found to be useful in making global substitutions of individual words during file processing, but must be used with care since the substitutions are unconditional, without any reference to context.

Some small examples of typical non-looped data items are shown in Fig. 5.3.5.4[link] and the corresponding ciftex translation based on a map file used for typesetting Acta Crystallographica Section C is shown in Fig. 5.3.5.5[link].

[Figure 5.3.5.4]

Figure 5.3.5.4 | top | pdf |

Sample CIF data input to ciftex.

[Figure 5.3.5.5]

Figure 5.3.5.5 | top | pdf |

Output from ciftex run on the data of Fig. 5.3.5.4[link].

Note the transformations of the numerical arguments and the translation of `sulphate' to `sulfate'.

5.3.5.3.1.2. Looped data

| top | pdf |

If the input token is a loop_ keyword, the program enters a different mode of operation. Looped data may be represented in print either as repetitive lists or in tabular format. There is no indication in a CIF dictionary of the appropriate representation (nor should there be, for what is essentially a matter of presentation) and the choice is made based on a flag associated with each data name in the map file. For non-tabular lists, the structure[Scheme scheme5] is translated to a sequence of [\hbox{\TeX}] codes of the form [Scheme scheme6]

In the case of tabulated data, the loop_ header is translated into a set of table headings and typographic codes are introduced to lay out in columnar format the values in the body of the list. The number of different data names in the loop header is counted and the data values are identified by their position in the loop modulo the total number of data names in the header (in effect, by their `phase' in the loop). In the simplest case, a [\hbox{\TeX}] command is emitted that builds a table with n columns, where n is the number of different data names. Then the data values are counted as they are processed. After every nth data value, a [\hbox{\TeX}] code is emitted indicating `end of table row' and a further code is emitted before the next value (if there is one) that means `beginning of new table row'. In all other cases, a code is emitted signifying `move to next column'.

Fig. 5.3.5.6[link] is a simplified extract from a table of atomic coordinates derived from the _atom_site_ loop in a CIF.

[Figure 5.3.5.6]

Figure 5.3.5.6 | top | pdf |

[\hbox{\TeX}] markup for typesetting a table of atomic coordinates.

5.3.5.3.1.3. The ancillary map file

| top | pdf |

The translation between a CIF data name and its replacement text in the [\hbox{\TeX}] output file is defined in the external map file. The format of the translation is very simple, as illustrated in Fig. 5.3.5.7[link].

[Figure 5.3.5.7]

Figure 5.3.5.7 | top | pdf |

Example map file for use with ciftex.

Each line starts with a CIF data name, which is terminated by a space character. The next character is either ` T' or ` N' to indicate whether the output should be tabulated or not. The next character is an arbitrary character from the ASCII character set, and is chosen to collect together data that will appear in the same logical section of the output file. This locator character may be associated, in another ancillary file described below, with additional text for output. The remainder of the line is the replacement text.

In the example supplied, the cell-length parameters map to the [\hbox{\TeX}] macros \cella, \cellb and \cellc (each preceded by a standard [\hbox{\TeX}] macro forbidding a page break immediately before the contents are printed). The details of the publication authors are described by a set of [\hbox{\TeX}] macros that will occur in two different locations in the output file (the authors' names and addresses may be looped together in the location labelled by the character a; any explanatory footnotes and email addresses will be printed elsewhere in the paper, at the location labelled X). The anisotropic displacement parameters Uij will be printed in a table and the replacement text consists of the [\hbox{\TeX}] codes that will be printed at the head of each column in the table.

The initial text on the line need not be a CIF data name; it may be any other single word. In this case, every occurrence of that word in the input CIF will be replaced by the replacement text.

If the initial character of the line is a hash mark #, the line is treated as a comment and discarded.

5.3.5.3.1.4. The ancillary format file

| top | pdf |

Because a printed paper may be more verbose than its parent CIF data file, it is necessary to add text to the output from ciftex to represent section headings, line spaces or other formatting instructions. The program reads an ancillary file, known as the format file, for such additional text.

Each line in the format file begins with a hash mark #, a single ASCII character and a colon. The second character is chosen to match the corresponding locator character associated with data names in the map file. The rest of the line is text to be output. When the locator character associated with the data name currently being processed differs from the previous one, the output text from all lines in the format file with the new locator character are output.

The special strings #[: and #]: indicate text to be emitted at the beginning and end of the output stream, respectively.

Fig. 5.3.5.8[link] is an example of a simplified format file. The first line is printed at the start of the output [\hbox{\TeX}] file; the second line at the end. The next line will be printed on the first occurrence of a data name flagged with the locator code a in the map file. In this example, that will be the name or address of an author of the paper; some typographic directives are emitted immediately before the authors' names and addresses, including the introduction of a blank line (`vertical skip', or `vskip') of height 10 typographic points.

[Figure 5.3.5.8]

Figure 5.3.5.8 | top | pdf |

Example format file for ciftex.

The lines beginning #g: are emitted immediately before the first data name in the group that is associated with locator code g. In this example, the effect is to output a heading and subheading before printing the cell-length parameters and to switch to double-column format. The line containing only the characters #g: provides for the introduction of a blank line into the [\hbox{\TeX}] file, with the sole purpose of making the file more readable by human editors.

The lines beginning #U: are emitted at the beginning of the table of anisotropic U values.

The mechanism looks complicated at first sight, but addresses the need to generate headings at standard locations in a printed paper when the exact content of the paper is not known in advance.

The different format for directives in the map and format files means that the same file can be used for both purposes, if required. In practice it is often easier to maintain different files: the same mapping between CIF data names and [\hbox{\TeX}] macros might be common to different journals, while each journal uses its own format file.








































to end of page
to top of page