International
Tables for
Crystallography
Volume G
Definition and exchange of crystallographic data
Edited by S. R. Hall and B. McMahon

International Tables for Crystallography (2006). Vol. G. ch. 5.3, pp. 515-517

Section 5.3.6.1.  STAR::Parser and related Perl modules

B. McMahona*

a International Union of Crystallography, 5 Abbey Square, Chester CH1 2HU, England
Correspondence e-mail: bm@iucr.org

5.3.6.1. STAR::Parser and related Perl modules

| top | pdf |

A collection of Perl modules has been developed at the San Diego Supercomputer Center (Bluhm, 2000[link]) to provide basic library routines for object-oriented manipulation of STAR files with restricted syntax appropriate to CIF applications. The module name suggests that a more complete STAR implementation may be considered in future developments, but at present the modules do not handle nested loops or the inheritance of data values from global blocks. Indeed, they are still rather limited in scope; nevertheless, for the programmer wishing to prototype CIF applications in Perl, they offer a very rapid entry to parsing CIFs and constructing useful data structures that can be manipulated with standard Perl tools.

The use of some of the modules is illustrated in Fig. 5.3.6.1[link], which is a simplified version of the main program loop in the application written to typeset the CIF dictionaries printed in Part 4[link] of this volume.

[Figure 5.3.6.1]

Figure 5.3.6.1 | top | pdf |

Skeleton version of an application to format a CIF dictionary for publication. Only the main program fragment is shown. Line numbering is provided for referencing in the text. format_category, format_item and the print_document_* commands are calls to external subroutines not included in this extract.

5.3.6.1.1. STAR::Parser

| top | pdf |

STAR::Parser is the basic parsing module and may parse either data files or dictionaries (which may include save frames). It contains a single class method, parse, which returns an array of DataBlock objects. Each DataBlock object contains all the data items within an individual data block of the file. Even if the file contains only a single data block, the resulting object is passed in an array.

The contents of the data blocks may be accessed and manipulated by the methods provided by the STAR::DataBlock and STAR::Dictionary modules. They are stored internally as a multi-dimensional hash (the Perl term for an associative array with keys and associated values, which may themselves be complex data objects). Keys are provided for data blocks, save blocks, categories and data items identified during the parse. The module provides no error checking of files or objects, against the dictionary or otherwise – limited checking functionality is available through other modules in this collection.

In the example of Fig. 5.3.6.1[link], the parse method is called at line 9 to read a DDL2 dictionary file (indicated by the -dict=〉1 parameter) and return an array of data blocks. In DDL2 dictionaries (such as the mmCIF dictionary of Chapter 4.5[link] ) an entire dictionary is contained within a data block; save frames partition the data block into definitions for separate items. Normally a DDL2 CIF dictionary has only a single data block; nevertheless, the example program can handle multiple data blocks in the array, and traverses the one or several data blocks in the array through a Perl foreach construct (line 15).

5.3.6.1.2. STAR::Dictionary

| top | pdf |

The STAR::Dictionary module contains class and object methods for Dictionary objects created by the STAR::Parser module, and is in fact a subclass of STAR::DataBlock (see next section). Since CIF dictionaries are fully compliant STAR files, they require little that is different from the methods developed for handling data files. The method get_save_blocks is provided to return an array of all save frames found in the Dictionary object.

In line 25 of the example, the method is called on each dictionary loaded from the input file (as described above, normally there will only be one). The method is combined with the Perl sort function to create an array of save frames from the dictionary, arranged in alphabetic order. All further manipulations of the contents of these save frames will use the methods of the generic STAR::DataBlock class.

5.3.6.1.3. STAR::DataBlock

| top | pdf |

This package provides several useful methods for handling the objects within a data block returned by the STAR::Parser module.

The class has a constructor method new, which can create a completely new DataBlock object if called with no argument. This is of course essential for applications that wish to write new CIFs. Alternatively, it may be called with a $file argument to retrieve an existing object that has previously been written to the file system using the store object method described below:[Scheme scheme7]

Table 5.3.6.1[link] summarizes the object methods provided by the package. The store method allows a DataBlock object to be serialized and written to hard disk for long-term storage. The Perl public Storable:: module is used.

Table 5.3.6.1 | top | pdf |
Object methods provided by the STAR::DataBlock Perl module

Method Description
store Saves a DataBlock object to disk
get_item_data Returns all the data for a specified item
get_keys Returns a string with a hierarchically formatted list of hash keys (data blocks, save blocks, categories and items) found in the data structure of the DataBlock object
get_items Returns an array with all the items present in the DataBlock
get_categories Returns an array with all the categories present in the DataBlock
insert_category Inserts a category into a data structure
insert_item Insert an item into a data structure
set_item_data Sets the data content of an item according to a supplied array

The get_item_data method returns the data values for a named data item. It is used frequently in the example program of Fig. 5.3.6.1[link]; for example, at lines 28–30 the array of categories in the input dictionary is assembled by retrieving the value associated with the data item _category.id as the component save frames are scanned in sequence. Note that for applications within dictionaries, the method takes a -save=〉$saveblock parameter to allow the extraction of items from specific save frames. For manipulations of data files, this parameter is omitted. In lines 17–20, the method is called without this parameter because the name and version of the dictionary are expected to be found in the outer part of the file, not within any save frame.

get_keys allows a user to display the structure of a CIF or dictionary file and can be used to analyse the content of an unknown input file. When written to a terminal, the string that is returned by this method appears as a tabulation of the items present at the different levels in the data structure hierarchy, each level in the hierarchy being indicated by the amount of indentation (Fig. 5.3.6.2[link]).

[Figure 5.3.6.2]

Figure 5.3.6.2 | top | pdf |

Structure of the imgCIF dictionary (Chapter 4.6[link] ) as described by the get_keys method of the STAR::DataBlock module. Only the high-order file structure and the contents of the first category are included in this extract.

The get_items and get_categories methods are largely self-explanatory. The items or categories in the currently active DataBlock object are returned in array context.

insert_item and insert_category are the complements of these methods, designed to allow the insertion of new items or categories. Where appropriate (i.e. in dictionary applications), the save frame into which the insertion is to be made can be specified.

The remaining method, set_item_data, is called to set the data of item $item to an array of data referenced by $dataref:[Scheme scheme8] As usual, an optional parameter -save=〉$save may be included for dictionary applications where a save frame needs to be identified; the value of the variable $save is the save-frame name.

Note that the current version of the module does not support the creation and manipulation of data loops, although the get_item_data method will correctly retrieve arrays of data values from a looped list.

There are five methods available to set or retrieve attributes of a DataBlock object, namely: file_name for the name of the file in which the DataBlock object was found; title for the title of the DataBlock object (i.e. the name of the CIF data block with the leading data_ string omitted); type for the type of data contained – `data' for a DataBlock object but `dictionary' for an object in the STAR::Dictionary subclass; and starting_line and ending_line for the start and end line numbers in the file where the data block is located. The method get_attributes returns a string containing a descriptive list of attributes of the DataBlock object.

5.3.6.1.4. STAR::Checker

| top | pdf |

This module implements a set of checks on a data block against a dictionary object and returns a value of `1' if the check was successful, `0' otherwise. The check tests a specific set of criteria:

(i) Are all items in the DataBlock object defined in the dictionary?

(ii) Are mandatory items present in the data block?

(iii) Are dependent items present in the data block?

(iv) Are parent items present?

(v) Do the item values conform to item type definitions in the dictionary?

Obviously, these criteria will not be appropriate for all purposes, and are in any case fully developed only for DDL2 dictionaries. An optional parameter -options=〉'1' may be set to write a list of specific problems to the standard error output channel.

5.3.6.1.5. STAR::Writer and STAR::Filter

| top | pdf |

Two other modules are supplied by this package. STAR::Writer is a prototype module that can write STAR::DataBlock objects out as files in different formats; currently only the write_cif method exists to output a conformant CIF. STAR::Filter is an interactive module that prompts the user to select or reject individual categories from a STAR::Dictionary object when building a subset of the larger dictionary.








































to end of page
to top of page