International
Tables for Crystallography Volume G Definition and exchange of crystallographic data Edited by S. R. Hall and B. McMahon © International Union of Crystallography 2006 |
International Tables for Crystallography (2006). Vol. G. ch. 5.6, p. 552
Section 5.6.3. Compression schemes
a
Stanford Linear Accelerator Center, 2575 Sand Hill Road, Menlo Park, CA 94025, USA, and bDepartment of Mathematics and Computer Science, Kramer Science Center, Dowling College, Idle Hour Blvd, Oakdale, NY 11769, USA |
Two schemes for lossless compression of integer arrays (such as images) have been implemented in this version of CBFlib:
(i) an entropy-encoding scheme using canonical coding;
(ii) a CCP4-style packing scheme.
Both encode the difference (or error) between the current element in the array and the prior element. Parameters required for more sophisticated predictors have been included in the compression functions and will be used in a future version of the library.
The canonical-code compression scheme encodes errors in two ways: directly or indirectly. Errors are coded directly using a symbol corresponding to the error value. Errors are coded indirectly using a symbol for the number of bits in the (signed) error, followed by the error itself.
At the start of the compression, CBFlib constructs a table containing a set of symbols, one for each of the 2n direct codes from to , one for a stop code and one for each of the maxbits − n indirect codes, where n is chosen at compression time and maxbits is the maximum number of bits in an error. CBFlib then assigns to each symbol a bit code, using a shorter bit code for the more common symbols and a longer bit code for the less common symbols. The bit-code lengths are calculated using a Huffman-type algorithm and the actual bit codes are constructed using the canonical-code algorithm described by Moffat et al. (1997).
The structure of the compressed data is described in Table 5.6.3.1.
|
The CCP4-style compression writes the errors in blocks. Each block begins with a 6-bit code. The number of errors in the block is 2n, where n is the value in bits 0 to 2. Bits 3 to 5 encode the number of bits in each error. The data structure is summarized in Table 5.6.3.2.
|
References
Moffat, A., Bell, T. C. & Witten, I. H. (1997). Lossless compression for text and images. Int. J. High Speed Electron. Syst. 8, 179–231.Google Scholar