Tables for
Volume G
Definition and exchange of crystallographic data
Edited by S. R. Hall and B. McMahon

International Tables for Crystallography (2006). Vol. G. ch. 5.7, pp. 557-562

Section 5.7.2. Case study: the fully automated reporting of small-unit-cell crystal structures

P. R. Strickland,a M. A. Hoylanda and B. McMahona*

aInternational Union of Crystallography, 5 Abbey Square, Chester CH1 2HU, England
Correspondence e-mail:

5.7.2. Case study: the fully automated reporting of small-unit-cell crystal structures

| top | pdf |

This section describes the route to publication of a small-molecule or inorganic single-crystal structure in Acta Cryst. C or E from the perspective of an author. Assembling the complete article

| top | pdf |

For many authors the generation of a CIF suitable for publication is quite straightforward, since diffractometer software and structure solution and refinement packages have all been capable of writing or reading the CIF format for some time. In some highly integrated systems, the entire experimental, analysis and report-generating pathway may be controlled through a common user interface.

In other cases, different components must be collected from different sources and merged together, either by software utilities or, in the worst case, by hand-editing. It is a useful feature of the text-based CIF format that it can be modified by text editors or in certain word-processing modes; indeed, this was the only way in which the earliest CIF-based papers could be constructed. However, significant expertise and understanding of the technical details of the file format are needed to produce hand-edited files that are totally free from error. Authors are now encouraged to use software designed to help them create complete and error-free files (e.g. the enCIFer and CIFEDIT editors described in Chapter 5.3[link] ).

A complete structure communication comprises the following components.

(a) Material common to the article as a whole:

  • (i) title and authors;

  • (ii) synopsis and/or abstract;

  • (iii) comment section;

  • (iv) acknowledgements;

  • (v) references.

(b) Material relevant to each structure:

  • (i) description of the experimental apparatus;

  • (ii) description of the settings and environmental conditions for the experiment;

  • (iii) experimental data, typically a list of measured and calculated structure factors for a single-crystal X-ray structure determination, or powder diffraction data with measured and calculated powder diffraction profiles;

  • (iv) information about the compound, including source, preparation and formula;

  • (v) summary of structure solution and refinement;

  • (vi) coordinates of atomic sites, their elemental composition, occupancy, anisotropic displacement parameters, whether they are in part of the structure affected by positional disorder, and information about their refinement restraints;

  • (vii) selected geometrical data.

(c) Graphical illustrations:

  • (i) chemical structural diagrams;

  • (ii) chemical diagrams of reaction pathways, tautomerism, bond properties etc.;

  • (iii) crystallographic displacement-ellipsoid diagrams;

  • (iv) crystallographic packing diagrams;

  • (v) other graphs, plots or images.

Different journals will have different requirements for the arrangement of these items. For example, at the time of publication (2005), Acta Crystallographica requires that diffraction data (structure factors or Rietveld refinement profiles) are provided as supplementary information in separate files from that containing the body of the paper. This policy originated in the early days of network file transfer where relatively large files of experimental data could be transferred only with difficulty. This is less of a practical constraint now, and a case could be made for including the experimental results as an integral part of a single submission file, especially since there is still no formal mechanism in the core CIF dictionary to enforce an unambiguous connection between separate data blocks containing related data.

There is also not at present a standard way to include graphics within a CIF. The mechanisms of the imgCIF dictionary (Chapter 3.7[link] ) offer a possible approach to this problem. It is also possible to envisage the automated generation of views of the structure directly from the numerical data in the CIF. Three-dimensional ellipsoid plots are routinely generated from CIFs submitted to Acta Crystallographica for use in the review process and incomplete categories of data names exist in the core dictionary for the representation of two-dimensional diagrams of chemical connectivity. At present, however, neither of these is sufficiently well developed to generate publication-quality graphics in different orientations and styles as preferred by an author.

A journal may provide a request list of the data items that it considers recommended or mandatory. The request list for Acta Cryst. C and E is given in Appendix 5.7.1[link]. An author can test a file intended for publication against a request list with a general-purpose CIF parsing tool such as cif2cif (Bernstein, 1998[link]) or QUASAR (Hall & Sievers, 1993[link]) (Chapter 5.3[link] ). Different request lists may be provided for different kinds of experiments, such as for powder diffraction experiments or for single-crystal studies using area detectors.

Note that an author always has the freedom to include additional data items in a CIF; the journal will exercise its own policy for the handling of data items not specified in its public request lists. The PUBL_MANUSCRIPT_INCL category available in the CIF core dictionary provides a mechanism for requesting the publication of data items that are not normally published by the journal (see Sections[link] and[link] ). Reporting multiple structures and using templates

| top | pdf |

In CIF format, a data name cannot be repeated within a data block. Therefore, each structure reported in a CIF must occupy a separate data block. A journal might request a separate file for each structure; in the case of Acta Cryst. C, however, a single file for the entire submission is required. This file therefore contains several data blocks if the article reports several structures. The data-block codes (i.e. the changeable label part of a data-block header data_label) have no particular significance and are usually chosen by the authors as meaningful identifiers within their own collection of structures. However, each block code may be used once only in any individual file.

If an article reports only one structure, the author can include the general text of the article in the same data block that records the structure or in a separate data block. If the file already contains several data blocks (because it reports multiple structures), using a distinct data block for the text of the article is the most natural way of organizing the contents of the file. Fig.[link] shows the structure of a CIF that describes several structures.


Figure | top | pdf |

Structure of a CIF describing several crystal structures.

Authors often have one or more local template data blocks that already include standard information about their contact details and details of the experiment. These templates may then be added or merged into the data blocks reporting the structures. Several standard crystallographic software packages include programs for merging CIF templates; one of the best known and most widespread is SHELX97 (Sheldrick, 1997[link]).

Some authors also use programmable macro facilities within commercial word-processing packages to achieve the same purpose. The IUCr application printCIF for Word (Westrip, 2004[link]) extends this approach by creating a custom editing and formatting environment within Microsoft Word. These are very helpful utilities for authors who are not CIF experts. However, they are restricted to particular operating systems or software environments and are thus not universally available.

The program enCIFer (Allen et al., 2004[link]) provides facilities for importing templates and external files, and for adding and maintaining standard information about the authors of a CIF. It provides alternative representations of a CIF as a text file and as a collection of containers and object fields, and provides a great deal of support for authors who are not familiar with the technical details of the CIF format. enCIFer and other useful text-editing programs are described in Chapter 5.3[link] . Adding extra information to an article

| top | pdf |

An article for publication in Acta Cryst. C or E is built from a standard request list of CIF data items. Among the items included in this list are ones that describe molecular geometry: bond and contact distances, bond angles and torsion angles. In most cases, unexceptional values of these are not worth displaying (particularly as Acta Cryst. C and E make the original CIF data available as supplementary material). Authors can choose which values are to be displayed using a `publication flag'. For example, the category of data items that decribes bond lengths includes the data name _geom_bond_publ_flag, which may be assigned the value `yes' or `no' for any particular bond length depending on whether it should or should not be displayed.

The other items in the request list comprise the complete set of items that are by default extracted for publication from a CIF if they are present. An author may of course add more detail to an article within standard free-text fields (such as _publ_section_comment). However, if the additional information is present as a data item that is not in the standard request list, the typesetting software can be told to add this item dynamically to the request list, thus including the extra information in the published article. The way to do this is to list the additional data name or names as values of ` _publ_manuscript_incl_extra_item'. The example below shows how to request that atom-site multiplicities and Wyckoff symbols are included in the table of atomic positions. These are data names defined in the core dictionary; this is indicated by the value `yes' of _publ_manuscript_incl_extra_defn. [Scheme scheme1]

In this example, the author has also requested the publication of the value of the magnetic permeability of the crystal, which does not have a standard dictionary definition, but which has been recorded under a local data name, _Smith_crystal_magnetic_perm. Note that for this item, _publ_manuscript_incl_extra_defn takes the value `no'. The journal typesetting software has no procedure for handling arbitrary additional content, but it may be configured to recognize such a data name and typeset it in the desired style. Once the software is aware of this new item, it will automatically extract and format it in future submissions, as long as the author continues to list it under _publ_manuscript_incl_extra_item. It is best if the informal data name includes a registered reserved prefix (see Section[link] , especially if machine-readable definitions are also provided in an appropriate DDL dictionary format and accessible through the IUCr register of CIF dictionaries (Section[link] ).

Care is needed when using _publ_manuscript_incl_extra_item:

(i) The extra items requested must be surrounded by quote marks, otherwise CIF software will try to interpret them as active data names.

(ii) The list is cumulative: if several _publ_manuscript_incl_extra_item loops appear in the file (one per data block), the request list that is generated will include all the extra items that appear in all of these loops, and that request list will be applied in full to all the data blocks in the file. It is therefore not possible to ask for an extra item from one data block but not another.

(iii) Not all possible terms in the official dictionaries may be recognized and handled appropriately by the journal software. To check this, the author can generate a preview of the formatted paper by using the printcif service, described in Section[link]

Two examples of this approach are shown in Fig.[link]. Atom-site positions and displacement parameters are often displayed without the associated Wyckoff symbols or multiplicities (to save space). In the first example, the author indicates that the Wyckoff symbols should be displayed.


Figure | top | pdf |

Examples of authors' request-list extensions for items not normally printed in a paper. (a) Printing additional standard data items. The data are listed as normal in the ATOM_SITE loop. (b) A complete table of non-standard quantities associated with contact distances is generated, complete with table caption and footnote.

In the second example, the author wishes to publish a table of a set of items not defined in the core CIF dictionary (in this example, contact distances with associated charge density and Laplacian functions). Here, utility data names are used to extract regularly tabulated data of arbitrary content from the CIF to create a table in the published article. Previewing the article

| top | pdf |

The appearance of the plain-text ordered arrangement of content in a CIF differs a great deal from its typeset representation in a journal article. It can help authors, therefore, if they can see how their article will appear in print (or as an online article) before they formally submit their article to a journal. Acta Cryst. C and E provide an online web service for this called printcif ( ).

When an author uploads a CIF to the service, the data within it are extracted (using a dynamically enhanced request list if the publication of extra items has been requested) and translated through a sequence of software filters to [\hbox{\TeX}] (Knuth, 1986[link]). The [\hbox{\TeX}] file is processed and a final document representation (a `preprint') in PostScript or Portable Document Format (Adobe Systems Incorporated, 1999[link], 2004[link]) is generated. The preprint is then downloaded to the author. The primary translation engine is the program ciftex (Section[link] ). However, printcif has additional content filters which are not distributed with ciftex; these are modified frequently to make additional pattern-based text substitutions or to make changes to the typographic style of the preprint to match any changes in the style of Acta Cryst. C or E.

A new approach to document formatting is being explored in the development of printCIF for Word (Westrip, 2004[link]), an embedded Visual Basic application suitable for CIF editing and formatting within Word (Section[link] ). This allows users to preview their article as they work on it. However, printCIF for Word does not have access to the constantly updated translation filters used by printcif. Data validation

| top | pdf |

The highly structured format of a CIF allows automated validation of the self-consistency and integrity of the structural data reported in it. What was traditionally a part of the referee's task in checking crystal structure papers can now be handled by software. Acta Cryst. C and E require authors to check their structures before submitting them for publication. The same checks are run on each CIF after submission and a report of the results is made available to the referees for use during the peer-review process.

The routine checking of submissions for errors was introduced by the IUCr journals in the early 1990s, initially as a manual procedure. When CIF was introduced, the new format was readily adopted as a standard interchange format from which the input files for different checking programs could be generated automatically. The development of a workflow based on CIF proved worthwhile, as CIF increasingly became the format for submission in the first place. Over time, too, much of the checking software became capable of reading CIFs directly, so that the intermediate data-conversion processes could be avoided.

Over several years, a great deal of experience was gained in the types of error that could most easily be detected using checking software. A major component of the checking suite was UNIMOL, which had been developed by the Cambridge Crystallographic Data Centre for checking the molecular geometry of database entries (Allen et al., 1974[link]). Other types of checks could be performed by running other general-purpose crystallographic packages under the direction of pre-defined scripts designed to exploit their particular strengths. Among the programs used in this way were NRCVAX (Gabe et al., 1989[link]), which incorporated the powerful MISSYM algorithm of Le Page (1988[link]), PARST (Nardelli, 1983[link]), an early version of PLATON (Spek, 1990[link]) and the BUNYIP routine for detecting additional symmetry (Hester & Hall, 1996[link]) within the Xtal program system (Hall et al., 2000[link]).

As experience grew in running these processes in increasingly automated ways, and in collecting, parsing and reformatting the most relevant diagnostic output, it became apparent that a modular system could be designed to perform most of the data checking entirely automatically. Preliminary work on the set of tests developed for the PREPUB component of the Xtal system (du Boulay & Hall, 1996[link]) led, through close cooperation with the IUCr editorial office and Ton Spek, the author of PLATON (Spek, 2003[link]), to the implementation of checkcif, which is described in Section[link] below. Automated data validation: checkcif

| top | pdf |

The current service for checking structural data submitted to IUCr journals is known as checkcif and is available at . Versions of this service have been made available to other publishers for some time. In 2003, a general service was introduced at to provide structural checks on CIF data sets destined for publication in non-IUCr journals or database deposition, or indeed to allow authors to assess the quality of their structure determinations whether they wish to publish them or not.

The tests carried out by checkcif include:

(i) a simple file syntax check: essential in the early days of manual CIF construction, but of less importance now as syntax-preserving editing programs have become more widespread;

(ii) tests for the self-consistency of mutually dependent data items present in the CIF;

(iii) a large collection of analytic tests on structural chemistry and molecular geometry based on the program PLATON (Spek, 2003[link]).

The checks carried out at the time of publication (2005) are listed in Appendix 5.7.2[link] and on the CD-ROM accompanying this volume. The current list is available from .

Although the results from checkcif provide valuable indications of possible inconsistencies or data errors, an article for publication is not accepted or rejected on the basis of the checkcif report alone. The report is always read by a reviewer as part of a considered critical appraisal of the article.

Sometimes, particular data values are so far from the expected values that some response is required from the author to explain them. The unusual values may be a consequence of poor experimental conditions that the author was unable to improve, or of poor crystal quality; they may indicate an uncertainty in part of the structure determination that the author considers acceptable, particularly if the purpose of the study is to concentrate on a different part of the structure; or they may genuinely indicate novel chemical features. Whatever the case, anomalous values usually need to be discussed by the author and the reviewer or editor, and often need to be commented on in the article. For Acta Cryst. C and E, checkcif generates in CIF format a list of the tests that have highlighted unusual values in the author's CIF (called `A alerts'), together with a text field for each of these tests in which the author may justify or discuss the apparently anomalous results (see Fig.[link]). Together these comprise a `validation reply form'. The author can complete this form and paste it into the final version of the CIF submitted for publication. The editor handling the paper can then read the comments in the validation reply form and decide whether to accept the paper for publication. The submission system will automatically return to the author any CIF which generates an A alert but does not contain a completed validation reply form.


Figure | top | pdf |

Extracts from a checkcif report for a `publication check' on a CIF to be submitted to an IUCr journal. (a) Alerts of various levels of severity are listed. (b) The journal policy on the handling of alerts is summarized and a validation reply form listing the A alerts is supplied for the author to fill in.

Every article published in Acta Cryst. E has as part of its supplementary material a summary of the checkcif report for the structure described in it. This summary includes any validation reply that the author has supplied. It also includes selected numerical data items identified by the journal editors as characterizing the overall quality and completeness of the structure determination.

The characterization of the `quality' of a structure is a contentious issue. For journals, where there is active selection of articles for publication, it can be difficult to assign criteria for assessing the quality of the structure determination without these being seen as judging the quality or worth of the scientific work giving rise to the result. Thus journals rely upon the experience and discernment of referees to identify structures `worth' publishing. However, in a comprehensive collection of structural data sets, such as in a public structural database, it might be possible to identify particular data items that could be used for weighting individual data sets when the database is being `mined' for particular patterns or characteristic values. It will be interesting to see whether a consensus emerges on what items would be suitable. It is clear that reliance on a single indicator will not be appropriate for sophisticated studies. The old idea that a structure could be classed as `good' or `bad' on the basis of its final residual R factor alone has long been abandoned, but it may be possible to stipulate criteria for a set of interrelated data items and use these to filter specific information from a database. Submission and review

| top | pdf |

When an author has previewed and checked the contents of the CIF and has made the changes suggested by a careful study of the preprint and the checkcif report, the article may finally be submitted to Acta Cryst. C or E by file upload over the web. Other files completing or supporting the submission are also transferred to the editorial office at this time. These include structure-factor or powder profile listings for each structure, figures and chemical diagrams, and sometimes other supplementary documents. Structure-factor listings are supplied in CIF format. Figures may be in one of a number of standard graphics file formats, and at the moment have to be uploaded as separate files. Future extensions to CIF, perhaps following the imgCIF approach, may allow all the items needed to submit an article, including figures, to be prepared as a single file.

When all the files have arrived at the editorial office, a review document is generated that can be sent to the referees. This document contains: the text and tables of the article that will appear in the final publication, but laid out in a more open style suitable for annotation by hand; tables of atomic positions and geometry (containing all the data in the CIF, not just the subset that has been selected for displaying in the published article); certain fields from the CIF that are not normally printed but which may contain details of the way in which the experiment was carried out (these fields might have been completed manually or by the software controlling the experiment); the figures and other supplementary documents; and a print-out of the report from a final checkcif cycle, including a displacement-ellipsoid plot of the molecule in a minimal-overlap least-squares plane view. This composite document provides the information that a referee will typically want to consider in a compact and convenient form. Because the CIF is so highly structured, producing this review document is in most cases entirely automatic. The complete CIF as submitted by the author and the experimental data are also made available to the reviewer.

If revisions are requested, authors may upload modified files. The generation of revised versions of an article is also largely automatic. Publication

| top | pdf |

When the final version of a CIF for Acta Cryst. C or E is approved, the article is ready for publication. Once more, the data fields required for the published article are extracted from the CIF and sorted. If the author has asked for additional items to be printed by using _publ_manuscript_incl_extra_item, these also are extracted. The result is transformed to a file suitable for processing by typesetting software. For Acta Cryst. C this was originally a [\hbox{\TeX}] file; now a further transformation generates an SGML file that conforms to the document type definition (DTD) common to all IUCr journals. This allows not only typesetting and printing, but also the generation of the HTML for the navigable online version of the article, and the extraction of metadata for building online tables of contents and for supplying to bibliographic databases.

The conventional published article then appears in a monthly issue. Each article is still similar in style to the type of structure report published in journals for decades, although tables of atomic positions and geometric data are not usually displayed now, since these data are so readily available from the online article.

The online version of the journal, however, presents a much more information-rich version of the article. Each article is generally available in the form of a PDF file, suitable for downloading and offline printing. There is also an HTML version of the same text, and this version has rich internal links that make it easy to scroll back and forth through the article, jump to specific sections and see figures in low-resolution thumbnail or high-resolution views. The reference list contains links to the articles that are cited. There may also be links to related records in chemical or crystal structure databases. The reader may also download the experimental data and any supplementary documents associated with the article. As mentioned above, for Acta Cryst. E a summary of the check report is also available.

Finally, the structural data may be downloaded directly in CIF format. The CIF is presented in two ways. If a reader follows one link in a web browser, the file is interpreted simply as a text file and appears as a simple listing in the browser window, from which it may be printed or saved to disk. However, if the reader follows the other link, the CIF is transmitted to the browser with a header declaring its MIME type (Freed & Borenstein, 1996[link]) as `chemical/x-cif'. This is one of several MIME types registered for particular presentations of chemistry-related content by Rzepa & Murray-Rust (1998[link]). The reader may then configure a web browser to respond in a specific way to content tagged with this MIME type; typically a helper application such as a molecular visualizer [e.g. Mercury (Bruno et al., 2002[link])] will be launched that allows three-dimensional visualization and manipulation of the molecular or crystal structure.

When an article has been published in Acta Cryst. C or E, the CIF is transferred to the relevant public structural databases. Thus, the transcription errors that used to cause so many problems for data harvesters are completely avoided and one of the initial goals of the CIF project is achieved: uncorrupted data transfer from diffractometer, through publication, to a final repository.

Because Acta Cryst. C and E handle almost exclusively the publication of structure reports, the editorial workflow based on CIF lends itself to a very high level of automation and the journals are produced efficiently and on short timescales. Routine refereeing of structures is made very easy by the provision of checking reports, and the universal use of e-mail and web file transfer means that production times can be very fast.


First citationAdobe Systems Incorporated (1999). PostScript language reference. 3rd ed. Reading, MA: Addison-Wesley Longman.Google Scholar
First citationAdobe Systems Incorporated (2004). PDF reference. 5th ed. Adobe Portable Document Format. Version 1.6. .Google Scholar
First citationAllen, F. H., Johnson, O., Shields, G. P., Smith, B. R. & Towler, M. (2004). CIF applications. XV. enCIFer: a program for viewing, editing and visualizing CIFs. J. Appl. Cryst. 37, 335–338.Google Scholar
First citationAllen, F. H., Kennard, O., Motherwell, W. D. S., Town, W. G., Watson, D. G., Scott, T. J. & Larson, A. C. (1974). The Cambridge Crystallographic Data Centre, part 3. The unique molecule program. J. Appl. Cryst. 7, 73–78.Google Scholar
First citationBernstein, H. J. (1998). cif2cif. CIF copy program. .Google Scholar
First citationBoulay, D. J. du & Hall, S. R. (1996). PREPUB. Pre-publication tests on CIF structural data. .Google Scholar
First citationBruno, I. J., Cole, J. C., Edgington, P. R., Kessler, M., Macrae, C. F., McCabe, P., Pearson, J. & Taylor, R. (2002). New software for searching the Cambridge Structural Database and visualizing crystal structures. Acta Cryst. B58, 389–397.Google Scholar
First citationFreed, N. & Borenstein, N. (1996). Multipurpose Internet Mail Extensions (MIME) part two: media types. Internet Engineering Task Force. Request for comment 2046. Google Scholar
First citationGabe, E. J., Le Page, Y., Charland, J.-P., Lee, F. L. & White, P. S. (1989). NRCVAX – an interactive program system for structure analysis. J. Appl. Cryst. 22, 384–387.Google Scholar
First citationHall, S. R., du Boulay, D. J. & Olthof-Hazekamp, R. (2000). Xtal crystallographic software. .Google Scholar
First citationHall, S. R. & Sievers, R. (1993). CIF applications. I. QUASAR: for extracting data from a CIF. J. Appl. Cryst. 26, 469–473.Google Scholar
First citationHester, J. R. & Hall, S. R. (1996). BUNYIP: in search of errant symmetry. J. Appl. Cryst. 29, 474–478.Google Scholar
First citationKnuth, D. E. (1986). The [\hbox{\TeX}] book. Computers and typesetting, Vol. A. Reading, MA: Addison-Wesley.Google Scholar
First citationLe Page, Y. (1988). MISSYM1.1 – a flexible new release. J. Appl. Cryst. 21, 983–984.Google Scholar
First citationNardelli, M. (1983). PARST. A system of FORTRAN routines for calculating molecular structure parameters from results of crystal structure analyses. Comput. Chem. 7, 95–98.Google Scholar
First citationRzepa, H. S., Murray-Rust, P. & Whitaker, B. J. (1998). The application of chemical Multipurpose Internet Mail Extensions (chemical MIME) internet standards to electronic mail and world-wide web information exchange. J. Chem. Inf. Comput. Sci. 38, 976–982.Google Scholar
First citationSheldrick, G. M. (1997). SHELX97. Program for the refinement of crystal structures. University of Göttingen, Germany. .Google Scholar
First citationSpek, A. L. (1990). PLATON, an integrated tool for the analysis of the results of a single crystal structure determination. Acta Cryst. A46 (Suppl.), C34.Google Scholar
First citationSpek, A. L. (2003). Single-crystal structure validation with the program PLATON. J. Appl. Cryst. 36, 7–13.Google Scholar
First citationWestrip, S. P. (2004). printCIF for Word. .Google Scholar

to end of page
to top of page