Tables for
Volume G
Definition and exchange of crystallographic data
Edited by S. R. Hall and B. McMahon

International Tables for Crystallography (2006). Vol. G. ch. 5.7, p. 563

Section Including CIF data in an article

P. R. Strickland,a M. A. Hoylanda and B. McMahona*

aInternational Union of Crystallography, 5 Abbey Square, Chester CH1 2HU, England
Correspondence e-mail: Including CIF data in an article

| top | pdf |

For journals other than those specializing in full-scale structure reports, including CIF data in tables or reports of structures within general articles is rather more problematic. The translation of CIF data into XML seems to be a promising route to explore, as journals and reference volumes are increasingly being typeset from XML files. Traditionally, publishing has emphasized content markup that leads to a particular typographic representation. Modern trends are towards markup that tags the content by purpose, with the representation directed by external `style files'. Consider Fig.[link], which shows the typeset representation of a set of data items in a CIF for a structural paper.


Figure | top | pdf |

Typesetting of structural data. The contents of the CIF (a) are transformed into a typeset representation (b) that omits, annotates or reorders the incoming data according to context and the style rules of the journal.

First, it can be seen that several CIF data items are omitted from the printed representation, such as the International Tables space-group number and the Hall symbol for the space group. For compactness, the printed data value does not have a legend or annotation if the meaning of an item is clear from the context; thus, the crystal system and Hermann–Mauguin space-group symbol are printed without any accompanying text. The journal may also omit information that is implicit given other data; thus the cell angles are not printed for an orthorhombic cell. On the other hand, units, which are implicit in the definition of a CIF data item, are printed. Related items are grouped together in a single expression, as in the case of the [\theta] range or the crystal dimensions. In some cases, numerical values have been rounded to meet the journal's policy.

All of these transformations are matters of style, but it can be seen that they are not always trivial mappings to single data names. The style files determining the transformation from a detailed explicit data tabulation in the initial CIF may need to implement complex logical tests to suit the requirements of the journal.

Fig.[link] shows the same extract in [\hbox{\TeX}], the markup and typesetting language that was used for several years to produce Acta Cryst. C. It can be seen from this extract that the actual markup maps very closely to the initial CIF. All the cell parameters, including the cell angles, are present in the source file. The expansion of the macros (e.g. \cellalpha) executes the logic required to determine whether the value is to be printed and generates the additional text surrounding the value. Each data name is mapped to a distinct macro (even if the macros themselves have identical or near-identical internal structure), which preserves the semantic labelling of the original CIF. These macros are maintained in a separate file referenced and executed by every invocation of the typesetting program.


Figure | top | pdf |

Part of a [\hbox{\TeX}] file used to print the article shown in Fig.[link](b).

In contrast, Fig.[link] shows part of the SGML now used to typeset Acta Cryst. C and to generate HTML versions of the articles online. It is immediately seen that the markup emphasizes typographic style and positioning, and there is no explicit labelling by semantic element. Additional labelling is now found in the document structure; the individual items are marked up as `list items' (〈li〉), but the arrangement of this list into a tabular form is a feature of the typesetting engine, not the SGML.


Figure | top | pdf |

Part of the SGML file used to print the article shown in Fig.[link](b).

It is clear that the [\hbox{\TeX}] macros provide a representation of the contents of the CIF that could easily be converted back to the initial input CIF. At present, such bidirectional translation is not possible from the SGML file.

Clearly, therefore, a mapping to SGML that preserved semantic markup would be preferable. It is most likely that suitable bidirectional translations would be based on XML.

to end of page
to top of page