International
Tables for Crystallography Volume G Definition and exchange of crystallographic data Edited by S. R. Hall and B. McMahon © International Union of Crystallography 2006 |
International Tables for Crystallography (2006). Vol. G. ch. 5.6, pp. 546-549
Section 5.6.2.1. Low-level CBFlib functions
a
Stanford Linear Accelerator Center, 2575 Sand Hill Road, Menlo Park, CA 94025, USA, and bDepartment of Mathematics and Computer Science, Kramer Science Center, Dowling College, Idle Hour Blvd, Oakdale, NY 11769, USA |
The prototypes for low-level CBFlib functions are defined in the header file cbf.h, which should be included in any program that uses CBFlib. As noted previously, every function returns an integer equal to 0 to indicate success or an error code on failure (Table 5.6.1.1).
The arguments to CBFlib functions are based on a view of a CBF/imgCIF data set as a tree (Fig. 5.6.1.1). The root of the tree is the data set and is identified by a handle that points to the data structures representing that tree. The main branches of the tree are the data blocks, identified by name or by number. Within each data block, the tree branches into categories, each of which branches into columns. Categories and columns also are identified by name or by number. Within each column is an array of values, the rows of which are identified by number. The current data block, category, column and row are stored in the data structures of a data set.
The following function descriptions include the formal parameters. When a `*' appears before a formal parameter, it is a pointer to the relevant value, rather than the actual value. The formal parameters for the low-level CBFlib functions are given in Table 5.6.2.1.
|
Before working with a CBF (or CIF), it is necessary to create a handle. When work with the CBF is completed, the handle and associated data structures should be released:
int cbf_make_handle (cbf_handle *handle );
int cbf_free_handle (cbf_handle handle );
Normally, processing cannot continue if a handle is not created. Typical code to create a handle is:
Once a handle has been created, the data structures can be loaded with all the information held in a CBF file:
Conceptually, all data values are associated with the handle at the cbf_read_file call. In practice, however, only the non-binary data are actually stored in memory. To work with potentially large binary sections most efficiently, these are skipped until explicitly referenced. For this reason, file must be a random-access file opened in binary mode [ fopen (..., "rb")] and must not be closed by the calling program. CBFlib will call fclose when the file is no longer required.
The headers parameter controls the handling of any message digests embedded in the binary sections (Table 5.6.2.2). A headers value of MSG_DIGEST will cause the code to compare the digest of the binary section with any header message digest value. To maximize processing efficiency, this comparison will be delayed until the binary section is actually read into memory or copied (a `lazy' evaluation). If immediate evaluation is required, use MSG_DIGESTNOW. In either case, if the digests do not match, the function in which the evaluation is taking place will return the error CBF_FORMAT. To ignore any digests, use the headers value MSG_NODIGEST.
|
The cbf_write_file call writes out the data associated with a CBF handle:
This call has several options controlling whether binary sections are written unencoded (CBF) or encoded in ASCII to conform to the CIF syntax (imgCIF), the type of headers in the binary sections, and the type of ASCII encoding and line termination used. The acceptable values for ciforcbf are CIF for ASCII-encoded binary sections or CBF for unencoded binary sections. The headers parameter (Table 5.6.2.3
) can take the value MIME_HEADERS to select MIME-type binary section headers or MIME_NOHEADERS for simple ASCII headers. The value MSG_DIGEST will generate digests for validation of the binary data and the value MSG_NODIGEST will skip digest evaluation. The header and digest flags may be combined using the logical OR operator.
|
Similarly, there are several combinable flags for the parameter encoding (Table 5.6.2.4). ENC_BASE64 selects BASE64 encoding, ENC_QP selects quoted-printable encoding, and ENC_BASE8, ENC_BASE10 and ENC_BASE16 select octal, decimal and hexadecimal, respectively. ENC_FORWARD maps bytes to words forward (1234) for BASE8, BASE10 or BASE16 encoding and ENC_BACKWARD maps bytes to words backward (4321). Finally, ENC_CRTERM terminates lines with carriage return (CR) and ENC_LFTERM terminates lines with line feed (LF) (thus ENC_CRTERM|ENC_LFTERM will use CR LF).
|
CBFlib maintains temporary storage on disk as necessary for files to be written, so that file does not have to be random-access. However, if it is random-access and readable, resources can be conserved by setting readable nonzero.
The remaining low-level functions are involved in navigating the tree structure, creating and deleting data blocks, categories and table columns and rows, and retrieving or modifying data values.
The navigation functions are:
The function cbf_find_datablock selects the first data block with name datablockname as the current data block. Similarly, cbf_find_category selects the category within the current data block with name categoryname and cbf_find_column selects the corresponding column within the current category. The function cbf_find_row differs slightly in that it selects the first row in the current column with the corresponding value and cbf_find_nextrow selects the row with the corresponding value following the current row. Note that selecting a new data block makes the current category, column and row undefined and that selecting a new category similarly makes the column and row undefined. In contrast, repositioning by column does not change the current row and repositioning by row does not change the current column.
The remaining functions navigate on the basis of the order of the data blocks, categories, columns and rows. Thus, cbf_select_datablock selects data-block number datablock, counting from 0, cbf_rewind_datablock selects the first data block and cbf_next_datablock selects the data block following the current data block.
All of these functions return CBF_NOTFOUND if the requested object does not exist.
The ` count' functions evaluate the number of data blocks in the data set, the number of categories in the current data block and the number of columns or rows in the current category:
The ` name' functions retrieve the current data block, category or column names:
As rows do not have names, the corresponding function is:
To create new entities within the tree, CBFlib provides the functions:
The ` new' functions add a new data block within the data set, a new category in the current data block, or a new column or row within the current category, and make it the current data block, category, column or row, respectively. If the data block, category or column already exists, then the function simply makes it the current data block, category or column. The function cbf_new_row adds the row to the end of the current category. cbf_insert_row provides the ability to insert a row before ordinal row, starting from 0. The newly inserted row gets the row ordinal row and the row that originally had that ordinal and all rows with higher ordinals are pushed downwards.
In general, CIF does not permit duplication of the names of data blocks or categories. In practice, however, duplications do occur. CBFlib provides ` force' variants to allow creation of duplicate data-block and category names. Because, in this case, the program analysing the resulting file can only distinguish the duplicates by ordinal, these variants are not recommended for general use.
The following functions are used to remove entities from the tree:
The basic ` remove' functions delete the current data block, category, column or row. Note that removing a data block makes the current data block, category, column and row undefined; removing a category makes the current category, column and row undefined. Removing a column makes the current column undefined, but leaves the current row intact, and removing a row leaves the current column intact. The function cbf_delete_row is similar to cbf_remove_row except that it removes the specified row in the current category. If the current row is not the deleted row, then it will remain valid.
All the categories in all data blocks, all the categories in the current data block or all the entries in the current category may be removed using the ` reset' functions.
When a column and row within a category have been selected, the entry value may be examined or modified:
A value within a CBF/imgCIF data set may be a simple character string, an integer or real number, or an array of integers. The functions cbf_get_value and cbf_set_value provide the basic functionality for normal CIF values, retrieving and modifying the current entry as a string. The functions cbf_get_integervalue and cbf_get_doublevalue interpret the retrieved string as an integer or real value and the functions cbf_set_integer and cbf_set_doublevalue convert the number argument into a string before setting the entry.
The functions for working with binary sections are more complicated as they must take into account compression, array size and the variety of different integer types available on different systems: signed/unsigned and various sizes.
The function cbf_get_integerarrayparameters retrieves the parameters of the current, binary, entry. The compression argument is set to the compression type used (Table 5.6.2.5). At present, this may take one of three values: CBF_CANONICAL, for canonical-code compression (see Section 5.6.3.1
below); CBF_PACKED, for CCP4-style packing (see Section 5.6.3.2
below); or CBF_NONE, for no compression. [Note: CBF_NONE is by far the slowest scheme of the three and uses much more disk space. It is intended for routine use with small arrays only. With large arrays (like images) it should be used only for debugging.] The binary_id value is a unique integer identifier for each binary section, elsize is the size in bytes of the array entries, elsigned and elunsigned are nonzero if the array can be read as unsigned or signed, respectively, elements is the number of entries in the array, and minelement and maxelement are the lowest and highest elements. If a destination argument is too small to hold a value, it will be set to the nearest value and the function will return CBF_OVERFLOW. If the current entry is not binary, cbf_get_integerarrayparameters will return CBF_ASCII.
|
cbf_get_integerarray reads the current binary entry into an integer array. The parameter array points to an array of elements interpreted as integers. Each element in the array is signed if elsigned is nonzero and unsigned otherwise, and each element occupies elsize bytes. The argument elements_read is set to the number of elements actually obtained. If the binary section does not contain sufficient entries to fill the array, the function returns CBF_ENDOFDATA. As before, the function will return CBF_OVERFLOW on overflow and CBF_ASCII if the entry is not binary.
cbf_set_integerarray sets the current binary or ASCII entry to the binary value of an integer array. As before, the acceptable values for compression are CBF_PACKED, CBF_CANONICAL and CBF_NONE. Each binary section should be given a unique integer identifier binary_id.
Two macros are provided to facilitate processing and propagation of error returns: one to return from the current function immediately and one to execute a given command first:
If the symbol CBFDEBUG is defined, alternative definitions that print out the error number as given in Table 5.6.1.1
are used: