International
Tables for
Crystallography
Volume G
Definition and exchange of crystallographic data
Edited by S. R. Hall and B. McMahon

International Tables for Crystallography (2006). Vol. G. ch. 5.3, pp. 504-506

Section 5.3.3.3.  HICCuP

B. McMahona*

a International Union of Crystallography, 5 Abbey Square, Chester CH1 2HU, England
Correspondence e-mail: bm@iucr.org

5.3.3.3. HICCuP

| top | pdf |

The program HICCuP (Edgington, 1997[link]) was an early graphical utility developed at the Cambridge Crystallographic Data Centre for interactive editing and validation of a CIF. It is no longer supported, having been replaced by enCIFer (Section 5.3.3.1[link]). Nevertheless, it contained some interesting features and is of potential interest to developers using multiple-platform scripting languages. It was implemented in the Python language (van Rossum, 1991[link]) and required that Tcl/Tk (Ousterhout, 1994[link]) be also available on the host computer. The name of the program is an acronym for `High-Integrity CIF Checking using Python'. HICCuP was designed to allow users of the Cambridge Structural Database (Allen, 2002[link]) to check structures intended for deposition in the database and therefore included a range of additional content checks specific to this purpose. These could, however, be disabled by the user.

5.3.3.3.1. Interactive use of the program

| top | pdf |

5.3.3.3.1.1. The control window

| top | pdf |

Because HICCuP was designed as an interactive tool, upon invocation it presented to the user a control window from which CIFs could be selected for analysis and in which summary results of the program's operations were logged. Fig. 5.3.3.8[link] shows an example of the control window after a single CIF has been loaded.

[Figure 5.3.3.8]

Figure 5.3.3.8 | top | pdf |

Control window of the HICCuP application.

In the large frame below the file-entry field are listed the data blocks found by the program. The names are highlighted in various colours according to the highest level of severity of errors found within the corresponding data block.

Because the utility was designed for processing large amounts of CIF data for structural databases, it was considered useful to supply a compact visual indicator of the progress of the program through a large file. This takes the form of a grid of rectangular cells, one column for each data block present. Each column contains three cells, which monitor the performance of checks on the file syntax, conformance against a CIF dictionary, and other checks specific to the requirements of the Cambridge Crystallographic Data Centre. As each data block was checked, the corresponding cells were coloured according to the types of error found. Different colours were used to indicate: no errors; structure errors in the initial syntax tests; dictionary errors; or a deviation from certain conventions used by journals and databases in naming datablocks.

The large frame at the bottom of the control window provides a text summary of the same information, listing the number of errors found.

Check boxes and an `Options…' button allowed some configurability of checks by the user.

5.3.3.3.1.2. The report frame and edit window

| top | pdf |

The user could get more details of the reported errors by clicking on the name of the data block of interest in the control window. The text of the CIF would appear in a new window positioned at the point where the program has detected the first error and a terse statement of the type of error, with a longer explanation of its nature and possible cause, would be given.

In the example of Fig. 5.3.3.9[link], the program has detected that there is a missing text delimiter (a semicolon character), and positions the text in the upper frame at the likely location of the error. The program has attempted to localize the region where the error may have occurred. Because a text field might contain arbitrary contents, including extracts of CIF content, it is impossible to be sure on purely syntactic grounds of the nature of the error. Nonetheless, some heuristic rules serve to identify the author's likely intent in the majority of cases. So, in this example, the user may scan the file contents in the vicinity of the line highlighted by the program and find the error within a few lines (in this example an incorrectly terminated _publ_author_footnote entry beginning `Current address:').

[Figure 5.3.3.9]

Figure 5.3.3.9 | top | pdf |

HICCuP edit window and error description.

For this example, the more literal vcif error analysis provides only the message [Scheme scheme1]

The upper frame in this window is an editable window, so that the user could modify the text and revalidate the current data block. Only when a satisfactorily `clean' data block was obtained were the changes saved, and the modified data block written back into the original file.

5.3.3.3.1.3. Dictionary browsing

| top | pdf |

An additional useful feature of the program was its interactive link to a CIF dictionary file (Fig. 5.3.3.10[link]). The browser window contains the definition section of the dictionary referring to the selected data name and hyperlinks to definitions of other data names referred to. Additionally, there is a small text-entry box allowing a specific definition to be retrieved and an `Index' button to list all available definitions.

[Figure 5.3.3.10]

Figure 5.3.3.10 | top | pdf |

HICCuP dictionary browser window.

5.3.3.3.2. Options

| top | pdf |

As already mentioned, the user could modify the detailed mode of operation of the program. Any or all of the `initial', `dictionary' or `other' checks could disabled.

The `dictionary' checks could be modified by the user through the `Options' button of the main control window. The CIF dictionary for validation could be specified; the dictionary itself had to be translated from a source file in DDL format to a Python data structure.

The types of dictionary-based validation supported by the program were:

(i) List Status (checking whether a data value should be included in a looped list),

(ii) Limited Enumeration Options (checking that a data value is one of the permitted codes where such a constraint exists),

(iii) Incorrect Enumeration Case [a special case of (ii), where a data value matches a permitted code except for incorrect alphanumeric case],

(iv) Enumeration Range (the data value falls outside the range permitted),

(v) Value Type (numb or char) (the data value has the wrong type),

(vi) List Link Parent (a data item is present within the data block, but its mandated parent item is not – for example, the data item _atom_site_aniso_label should not be present without its parent data item _atom_site_label),

(vii) List Reference (the required data name used to reference the loop in which the current data name appears is missing),

(viii) Esd Allowable (a data value appears to have a standard uncertainty value where one is not expected).

The user could also supply the program with a list of data names that do not appear in the validation dictionary but for which no warning message should be raised. The program normally flagged such nonstandard data names as possible errors and suggested the possible form of a standard data name that might have been intended. This was useful in catching misspellings of additional data items entered by hand.

The program could also be run in a batch mode when the objective was to work through a large volume of CIF data and identify the data blocks that require attention. This mode of operation is particularly useful in databases or publishing houses. In this mode, input is from a named file or from the standard input channel; output is written to standard output or redirected to a results file. The operation of the program may be controlled by the application of various command-line flags.

References

First citation Allen, F. H. (2002). The Cambridge Structural Database: a quarter of a million crystal structures and rising. Acta Cryst. B58, 380–388.Google Scholar
First citation Edgington, P. R. (1997). HICCuP: High-Integrity CIF Checking using Python. Cambridge: Cambridge Crystallographic Data Centre.Google Scholar
First citation Ousterhout, J. K. (1994). Tcl and the Tk toolkit. Reading, MA: Addison-Wesley.Google Scholar
First citation Rossum, G. van (1991). Python programming language. http://www.python.org .Google Scholar








































to end of page
to top of page