International
Tables for
Crystallography
Volume G
Definition and exchange of crystallographic data
Edited by S. R. Hall and B. McMahon

International Tables for Crystallography (2006). Vol. G. ch. 5.3, pp. 520-522

Section 5.3.7.2.  CifSieve : automatic construction of CIF input functions

B. McMahona*

a International Union of Crystallography, 5 Abbey Square, Chester CH1 2HU, England
Correspondence e-mail: bm@iucr.org

5.3.7.2. CifSieve: automatic construction of CIF input functions

| top | pdf |

Among the utilities described in the ZINC package above was a tool to generate Fortran namelist files. It is a common requirement of applications developers that they should be able swiftly to convert existing programs to read CIF data. While libraries such as CIFtbx (Chapter 5.4[link] ) and CIFLIB (Westbrook et al., 1997[link]) offer very powerful functions for building CIF applications, it can be time-consuming to integrate them with existing software. It is a goal of CifSieve (Hester & Okamura, 1998[link]) to enable the rapid creation of new CIF-conversant software by using a CIF dictionary as a template for input data structures.

The CifSieve program runs on Unix systems with installed versions of the software utilities and programming languages bison or yacc, flex, Perl (Wall et al., 2000[link]) and C.

5.3.7.2.1. Overview of the process

| top | pdf |

The data names in a CIF are defined in a dictionary written in DDL1 or DDL2 formalism. Therefore, information about the data type and array structure of data variables is already to hand for a software author wishing to determine how to read CIF data into a program's data structures. The CifSieve process requires that the programmer augment the relevant CIF dictionary by adding to a copy of the definition of desired items a new attribute, named _variable_name, that passes to the application program the name of the associated program variable.

A program BuildSiv then reads the augmented dictionary and produces a subroutine capable of reading a CIF and transferring the data items tagged in the augmented dictionary to internal variable storage. The associated data structure is presented in an ancillary file which must be linked to the application program.

CifSieve can produce input subroutines and header or include files for C and Fortran language programs. For C applications, the input subroutine is called cifsiv_ and is invoked with arguments cifsiv_(CIF, block) where CIF is the name of the input CIF and block is the name of the data block from which data should be read. The data structure is declared in a header file cifvars.h which must be included in subroutines that manipulate the data input from the CIF. For Fortran applications, the input subroutine is also called cifsiv_, but takes an additional argument, blockbeg, which is the address of the common block containing the input variable names, declared in the include file forcif.inc.

5.3.7.2.2. The augmented DDL dictionary

| top | pdf |

Fig. 5.3.7.4[link] is an example of the annotations necessary to flag the data names that refer to data items desired to be input from a CIF. The current implementation requires that a copy of the DDL dictionary relevant for the CIF be physically edited to include the new _variable_name attribute. The inclusion of such a new attribute will not affect the use of the CIF dictionary for other purposes and by other software.

[Figure 5.3.7.4]

Figure 5.3.7.4 | top | pdf |

Extracts from an augmented DDL1 dictionary (version 1.0 of the core CIF dictionary). The additional _variable_name entry is shown in italics.

The definition blocks of data items that are not to be read by the application should be left unchanged.

The value assigned to the _variable_name attribute is the name of the variable declared in the application program for storing the input data item. If the items to be input are part of an array (i.e. they exist in the CIF as a looped list), the variable name should be supplied as a dimensioned array variable, e.g. atsiteu[1000] in the example of Fig. 5.3.7.4[link].

The same attribute (_variable_name) may be inserted in DDL1 or DDL2 dictionaries. Separate parsers are supplied for use with either format. When BuildSiv is invoked, the parser reads the augmented dictionary and identifies the data items required by the target input subroutine by the presence of a _variable_name attribute in the definition block. The definition is read and the relevant values of the type (DDL attribute _type), item name ( _name) and variable name are output in a simple tag–value format and in a standard order. For DDL2 dictionaries, values of _item_aliases.alias_name and _item_linked.parent_name, if present, are also output. The DDL parser thus transforms and simplifies the dictionary contents.

Where the item-name attribute occurs inside a loop (i.e. several data names occur in a single definition block in the dictionary), the variable name for that particular definition block will be given an extra array dimension by CifSieve, equal to the number of names in the loop. When a name from this loop is found in a CIF, the value will be read into the respective array location. If an _item_aliases.alias_name attribute is present (DDL2), the alias will also be recognized in CIF input files. If this attribute occurs together with looped item names in the domain dictionary, an attempt is made to determine the parent _item.name in the loop to which this _item_aliases.alias_name refers. This is done within the BuildSiv program by examining _item_linked.parent_name entries within the same definition block.

Data typing is simplified; the _item_type.code values of DDL2 dictionaries are collapsed onto primitive `numb' or `char' types. Values of type numb are declared and stored as type double (C) or REAL*8 (Fortran), while values of type char are stored as character arrays char[84] (C) or CHARACTER*84 (Fortran). In consequence, multiple lines of text cannot be retrieved with this version of CifSieve. Note in particular that values declared as of type `int' in DDL2 dictionaries will be stored as double-precision real.

5.3.7.2.3. Input to a C application program

| top | pdf |

When a DDL dictionary dictfile has been edited in accordance with the description above, the program BuildSiv may be run under a Unix-like operating system with a command of the form

BuildSiv dictfile ddlversion

where ddlversion takes the values `1' or `2' to indicate that a DDL1 or DDL2 parser is appropriate. If the option `-e' is given before dictfile, variable definitions and read capability for standard uncertainty values will be included as well. The name of the variable that will hold the standard uncertainty is the name given by the programmer with the string esd appended.

An object file cifsiv.o is produced together with a header file cifvars.h. Some source-code files are also produced as intermediate files in the lexical analysis and parse phases of the software build; these may be deleted. The object file must be linked against the other object files when the application program is compiled and references to the header files must be introduced (generally through C preprocessor #include directives) within the application code where access to the imported data structures is required.

Fig. 5.3.7.5[link] is an example of the header file cifvars.h built when BuildSiv reads the augmented dictionary of Fig. 5.3.7.4[link] with the `-e' option to interpret and store standard uncertainties.

[Figure 5.3.7.5]

Figure 5.3.7.5 | top | pdf |

Header file cifvars.h for a C application built by BuildSiv from the augmented DDL dictionary of Fig. 5.3.7.4[link].

The integer variable errornum stores a nonzero value if an error occurs in attempting to read a CIF, and an error message is stored in the character array errormes, indicating the nature of the problem. Errors generated by the input subroutine cifsiv_ are not fatal to the parent application program, and will at worst discard the particular loop block or data item affected. The parser operates by discarding CIF data upon encountering an error until it reaches an understandable set of input values. So, for example, if three numbers appear after an item name instead of one, the second two will be ignored after the error variables have been set, and parsing will continue. Similarly, if a serious error occurs within a loop, such as the appearance of an item name not matching an array variable, the entire loop is normally ignored. If a new packet of looped data exceeds the specified array limits, all further data in that loop are ignored.

The cifsiv_ function has prototype

void cifsiv_ (char* filename , char* blockname )

and requires pointers to character strings containing the name of the input file and the data-block code from which input is required.

A simple example C application illustrating the use of the cifsiv_ subroutine is given in Fig. 5.3.7.6[link].

[Figure 5.3.7.6]

Figure 5.3.7.6 | top | pdf |

An example C program designed to read CIF data as tagged in the augmented DDL dictionary of Fig. 5.3.7.4[link].

5.3.7.2.4. Input to a Fortran application program

| top | pdf |

A Fortran program can make use of the C input function generated by BuildSiv as long as the compiler used is capable of linking C and Fortran modules. For Fortran applications, the `-f' command-line option is used:

BuildSiv -f dictfile ddlversion

A C structure is defined for use within the cifsiv_ subroutine and an identically constructed Fortran common block is built for use within Fortran routines. The first variable within the common block must be passed as an additional argument when the cifsiv_ function is called. In the current implementation, that variable is always called `BLOCKBEG'. The input subroutine is thus called from within a Fortran program by a line of the type

CALL CIFSIV( FILE , BLOCK , BLOCKBEG)

where FILE and BLOCK are, respectively, the name of the input file and data block.

Fig. 5.3.7.7[link] is an example Fortran include file generated by BuildSiv and Fig. 5.3.7.8[link] is an example application incorporating this file. As with the C examples, the CIF data to be read are those specified in the dictionary augmented according to Fig. 5.3.7.4[link].

[Figure 5.3.7.7]

Figure 5.3.7.7 | top | pdf |

Fortran include file forcif.inc for an application built by BuildSiv from the augmented DDL dictionary of Fig. 5.3.7.4[link].

[Figure 5.3.7.8]

Figure 5.3.7.8 | top | pdf |

An example Fortran program designed to read CIF data as tagged in the augmented DDL dictionary of Fig. 5.3.7.4[link].

It may be noted that the C header file generated by the Fortran implementation of BuildSiv (and which is used directly by the C object file produced) is callable by any other C program or subroutine. The Fortran common block is represented by a C structure named cifcmnptr, so that the variable names are stored within that structure and must be addressed through the C [\rightarrow] operator. That is, an additional C routine compiled in with the Fortran example program of Fig. 5.3.7.7[link] would refer to the variable holding the value of the input _refine_ls_extinction_method as (char *)cifcmnptr-〉extmet.

References

First citation Hester, J. R. & Okamura, F. P. (1998). CIF applications. X. Automatic construction of CIF input functions: CifSieve. J. Appl. Cryst. 31, 965–968.Google Scholar
First citation Wall, L., Schwartz, R. L., Christiansen, T. & Orwant, J. (2000). Programming Perl, 3rd ed. Sebastopol, CA, USA: O'Reilly & Associates, Inc.Google Scholar
First citation Westbrook, J. D., Hsieh, S.-H. & Fitzgerald, P. M. D. (1997). CIF applications. VI. CIFLIB: an application program interface to CIF dictionaries and data files. J. Appl. Cryst. 30, 79–83.Google Scholar








































to end of page
to top of page