cif2cif

McMahon, B.

doi:10.1107/97809553602060000753

International
Tables for
Crystallography
Volume G
Definition and exchange of crystallographic data
Edited by S. R. Hall and B. McMahon

pdf | chapter contents | chapter index | related articles

International Tables for Crystallography (2006). Vol. G. ch. 5.3, pp. 511-512

Section 5.3.5.2. cif2cif

B. McMahon^a ^*

^a International Union of Crystallography, 5 Abbey Square, Chester CH1 2HU, England
Correspondence e-mail: bm@iucr.org

5.3.5.2. cif2cif

| top | pdf |

cif2cif (Bernstein, 1998) is a program built with the CIFtbx toolkit (Chapter 5.4 ) to copy a CIF while checking data names against dictionaries, optionally reformatting numbers to maintain standard uncertainties within a specified range. The output CIF may contain a subset of the data in the original CIF according to a request list, in the manner of QUASAR (Hall & Sievers, 1993).

The program was built as a sample application using CIFtbx routines and grew out of requirements from several sources.

5.3.5.2.1. Operation

| top | pdf |

5.3.5.2.1.1. Copying

| top | pdf |

In its simplest application, the program copies a CIF from the standard input channel to standard output. The copy is not verbatim (standard utilities of the computer operating system should be used for that purpose), but the output CIF differs from the input only in the following respects: some comments are deleted; lines in the input longer than 80 characters are wrapped to 80 characters or less; white space between tokens may be altered, especially in an attempt to align entries in looped lists in a cosmetically pleasing manner. While none of these changes should affect robust CIF-parsing applications, they are nevertheless useful in imposing a uniform style of presentation for browsing in a text editor or other human-readable framework.

5.3.5.2.1.2. Constraining standard uncertainties to specified ranges

| top | pdf |

Some journals require that standard uncertainties in experimental values should be quoted within a specified range. Typically the standard uncertainty (s.u.) should be quoted as an integer in parentheses, modifying the last place or two of decimals in the experimental data, and with a value between 2 and 19. cif2cif permits s.u. values in the ranges 1–9, 2–19 or 3–29, selectable by a command-line switch. The effect of applying the `rule of 19' would be to change a value of 1.458(1) in the input CIF to 1.4580(10) in the output.

5.3.5.2.1.3. Dictionary validation

| top | pdf |

cif2cif will open one or more CIF dictionary files as it copies the input CIF and identify certain classes of error against the dictionary definitions. The conditions that will raise an error are an unrecognized data name or a wrong data type. The program will also optionally indicate a warning if a data name has been assigned a category different from the leading portion of the data name – this may indicate an inconsistency within the dictionary itself.

5.3.5.2.1.4. Serving a request list

| top | pdf |

cif2cif will extract a subset of the data items contained in a CIF as specified by a request list, in the manner of QUASAR. The handling of data names specified in the request list is as described in Section 5.3.5.1.3 above, with the following additional feature. The special string data_which_contains: will extract the specified data items from the first data block in which at least one occurs; the block code need not be known in advance.

Some care must be exercised in attempting to extract data from data blocks by context without prior knowledge of the file contents. Consider the following simple example file: [Scheme scheme2]

The loop containing _A1 and _B1 cannot be extracted with a request list of the form [Scheme scheme3] because _A1 occurs in the first data block encountered; the output from cif2cif in this example will be [Scheme scheme4]

The behaviour of the program differs from QUASAR in two other small ways. When the request list forces the output data stream to contain the same data-block header more than once, an error message is posted to the standard error channel and the data-block headers in the output stream are annotated with a comment of the form ` #〈---- duplicate data block'. In this case the output file does not conform to the CIF syntax rules.

When a data name is requested but no matching data item appears in the output file, cif2cif writes an error message to the standard error channel. However, unlike QUASAR, which inserts the requested data name in the output stream with an associated value of ` ?' (for unknown), cif2cif produces no output for the requested data item.

5.3.5.2.1.5. Other features

| top | pdf |

Some additional features are of use in special circumstances.

The user may preserve the layout of the contents of looped lists exactly as in the input file, or may ask the program to adjust the layout to a more visually pleasing tabular form.

The user may enable recognition of data-name aliases in the dictionaries used for validation. When the relevant command-line argument is set to true, user-supplied data names will be transformed to the canonical forms in the validating dictionary. This would permit, for example, a small-molecule CIF using the core dictionary definitions to be converted to mmCIF format.

The user may prefix each line of output with an identical character string. A typical reason for so doing would be to include a fragment of CIF listing within the body of an email message or some other document. Such an output would not conform to the syntax rules for CIF.

5.3.5.2.2. Invocation of the program

| top | pdf |

cif2cif is another application of the CIFtbx library by the same author, and so has a similar user interface to that of CYCLOPS (Section 5.3.4.1.2). Under a Unix-like operating system, the program is typically called with a command such as

cif2cif -i infile -o outfile [ -q reqfile]

where infile is the name of the input file, outfile is the output file and reqfile is an optional file containing a request list for a subset of the original contents.

A more complete set of options available in a Unix-like operating environment is

cif2cif [ -i infile] [ -o outfile] [ -d dictfile] [ -q reqfile] [ -f cmndfile] [ -c catck] [ -a alias] [ -t tab] [ -e sulim] [ -p prefix]

where the options are as follows:

-i specifies the name of the input file, infile.

-o specifies the name of the output file, outfile.

-d specifies the name of a dictionary file, dictfile, against which the existence, type and category of data names are checked. The dictionary file may be either a CIF dictionary or a list of file names. That is, it may contain dictionary definitions in DDL format or (if the file begins with the characters #DICT) it may contain a list of dictionary file names to be entered. Thus, multiple dictionaries may be specified to the program.

-q specifies the name of the request file, reqfile, containing a list of data names (with associated data-block directives) that should be extracted as a subset of the contents of the original file.

-f specifies the name of a command file cmndfile that contains additional directives to the program.

-c is a flag indicating whether an error message should be raised if a data name has been assigned a category different from the leading portion of the data name itself. The Boolean variable catck may take the values `t', `1' or `y' for true, `f', `0' or `n' for false.

-a is a flag indicating whether data-name aliases in the validating dictionary should be used to replace user-supplied names by their canonical forms. The Boolean variable alias may take the same values for true or false as above.

-t is a flag indicating whether the output should be reformatted with tabs to produce a regular table layout within looped lists. The Boolean variable tab takes the same values as above. If true, text is reformatted; if false, the original formatting is retained.

For the flags expecting Boolean values, the default is `f' (false).

-e specifies the precision to retain in rounding standard uncertainty values. The permitted integer values are 9, 19 (the default) and 29.

-p takes a string value which is prefixed to every line of output. Every occurrence of the underscore character ` _' in the prefix is changed to a space on output.

If no input or output file names are specified, the program will read from the standard input channel or write to standard output, respectively. The special character hyphen (` -') may also be supplied as an argument in place of a file name to indicate standard input or standard output as appropriate.

Finally, if the operating system supports the passing of environment variables to a program, the name of the input file may be passed as the value of $cif2cif_INPUT_CIF, and likewise the output file, $cif2cif_OUTPUT_CIF, dictionary file, $cif2cif_CHECK_DICTIONARY, and request file, $cif2cif_REQUEST_LIST, may be specified.

References

Bernstein, H. J. (1998). cif2cif. CIF copy program. http://www.iucr.org/iucr-top/cif/software/ciftbx/cif2cif.src/ .Google Scholar

Hall, S. R. & Sievers, R. (1993). CIF applications. I. QUASAR: for extracting data from a CIF. J. Appl. Cryst. 26, 469–473.Google Scholar

International Tables for Crystallography (2006). Vol. G. ch. 5.3, pp. 511-512