International
Tables for
Crystallography
Volume G
Definition and exchange of crystallographic data
Edited by S. R. Hall and B. McMahon

International Tables for Crystallography (2006). Vol. G. ch. 3.3, pp. 123-126

Section 3.3.7. File metadata

B. H. Tobya*

a NIST Center for Neutron Research, National Institute of Standards and Technology, Gaithersburg, Maryland 20899-8562, USA
Correspondence e-mail: brian.toby@nist.gov

3.3.7. File metadata

| top | pdf |

The many data items in the core dictionary that decribe file auditing and history cover most of the metadata requirements of a pdCIF, but two new data items in the pdCIF category PD_BLOCK are introduced to provide a specific mechanism for identifying and relating individual data blocks.

Data items in this category are as follows:

PD_BLOCK [Scheme scheme22]

The data item _pd_block_id is used to define a unique name for each data block. This name is used so that one data block may reference another data block. Since CIF blocks may be separated into different files, or many CIFs from different sources may be grouped into a single file, the block ID provides a robust mechanism for maintaining references between blocks, independent of how CIF blocks have been arranged between files. The intent is that a site that archives pdCIFs will construct an index to _pd_block_id names that can be used to resolve block ID references.

The definition for _pd_block_id gives a procedure for creating a _pd_block_id name that is extremely unlikely to be duplicated. Other mechanisms for creating unique names can also be used: for example, using a web page name (URL) could be appropriate if care is taken never to reuse the URL.

The need for the block ID/block pointer mechanism is demonstrated by the following example. Consider a case where a neutron powder diffraction data set and an X-ray powder diffraction data set have been used together to determine a single structural model for a single crystalline phase. CIF does not allow the two data sets to be placed in a single block, since this would require two independent loops of observations where each loop uses some of the same data names. One can create a CIF with two blocks and include the structural model in the block that contains either of the two data sets. However, if this is done, a logical link is needed between the two blocks to make it clear that the structural model was derived from both data sets. It is better practice to place the structural model in a third data block, as this emphasizes the fact that the model is derived from both data sets. Again, logical links to the data sets are needed.

In both these cases, the data item _pd_block_diffractogram_id would be included in the data block containing the structural model and will point to _pd_block_id values assigned in the data blocks containing the diffraction data to establish the connection between the data sets and the structural model. The presence of more than one value for _pd_block_diffractogram_id, through use of a loop, indicates that multiple data sets were used and thus these structural results are from a combined refinement. Sometimes, powder and single-crystal diffraction data are used together (most commonly to team X-ray single-crystal diffraction data with neutron powder diffraction data). In this case, _pd_block_diffractogram_id will point to two _pd_block_id values, where one is assigned to the single-crystal data set.

In contrast to the example above, in which block pointers are used to link a single structural model to multiple data sets, another application for these pointers is for describing materials that contain more than one phase. In this case, _pd_phase_block_id is placed in the data block containing the data set to link it to the blocks defining the phases.

In summary, three types of links between data blocks are defined.

(i) _pd_block_diffractogram_id connects a phase to one or more data-set blocks;

(ii) _pd_phase_block_id connects a data set to one or more phase blocks;

(iii) _pd_calib_std_external_block_id connects a block to measurements used to provide calibration constants used in the block.

It is good practice to use both _pd_block_diffractogram_id and _pd_phase_block_id in a pdCIF with multiple blocks.

3.3.7.1. Use of block pointers

| top | pdf |

More complex link structures will be needed when multiple data sets and multiple phases occur together. Example 3.3.7.1[link] outlines a pdCIF reporting the results of a TOF powder-diffraction study of a physical mixture of nickel and silicon powders in which two separate diffraction banks, measured at two different Bragg angles, were used. In this case, five CIF blocks are used. The first CIF block reports the overall and publication details. The next two CIF blocks report crystallographic information for each phase and the last two blocks report the observed, processed and calculated diffraction intensities and reflection tables.

Example 3.3.7.1. A CIF with multiple data blocks, demonstrating a suitable construction when multiple data sets and multiple phases occur together.

[Scheme scheme23] [Scheme scheme24] [Scheme scheme25] [Scheme scheme26]

A second purpose for _pd_block_id is to provide a mechanism for tracking successive modifications to a CIF. Consider the case where a data set is obtained at a user facility and the resulting measurements are distributed as a CIF. In this file, a value is supplied for _pd_block_id based on the time when the measurements were made. At a later time, when these observations are analysed, a new CIF is created, containing both the original measurements and the results from the analysis. Rather than replace the original value for _pd_block_id, the data item can be placed in a loop and another value, defining a second block ID, can be added. This will indicate the connection to the initial CIF, since the original block ID is retained.

A potential future use for block pointers may be to reference non-CIF data files that contain large two- and three-dimensional data structures. This is expected to become increasingly important as neutron and synchrotron instruments are constructed that cover increasing ranges of solid angle. As mentioned in Section 3.3.2[link], CIF is not well suited to these complex, large and possibly irregular measurement arrays. The NeXus format has been developed by a consortium of synchrotron and neutron laboratories to address these concerns and is currently being used for a variety of scattering applications (NeXus, 1999[link]). The NeXus format is based on the platform-independent HDF binary standard (HDF, 1998[link]). The use of block pointers to resolve references to non-CIF documents will require additional definitions.

References

First citation HDF (1998). NCSA HDF home page. http://www.hdfgroup.org/ .Google Scholar
First citation NeXus (1999). NeXus data format. http://www.nexusformat.org .Google Scholar








































to end of page
to top of page