Canonical-code compression

Ellis, P. J.; Bernstein, H. J.

doi:10.1107/97809553602060000756

International
Tables for
Crystallography
Volume G
Definition and exchange of crystallographic data
Edited by S. R. Hall and B. McMahon

pdf | chapter contents | chapter index | related articles

International Tables for Crystallography (2006). Vol. G. ch. 5.6, p. 552

Section 5.6.3.1. Canonical-code compression

P. J. Ellis^a and H. J. Bernstein^b ^*

^a Stanford Linear Accelerator Center, 2575 Sand Hill Road, Menlo Park, CA 94025, USA, and ^bDepartment of Mathematics and Computer Science, Kramer Science Center, Dowling College, Idle Hour Blvd, Oakdale, NY 11769, USA
Correspondence e-mail: yaya@bernstein-plus-sons.com

5.6.3.1. Canonical-code compression

| top | pdf |

The canonical-code compression scheme encodes errors in two ways: directly or indirectly. Errors are coded directly using a symbol corresponding to the error value. Errors are coded indirectly using a symbol for the number of bits in the (signed) error, followed by the error itself.

At the start of the compression, CBFlib constructs a table containing a set of symbols, one for each of the 2ⁿ direct codes from $[-2^{n-1}]$ to $[2^{n-1}-1]$ , one for a stop code and one for each of the maxbits − n indirect codes, where n is chosen at compression time and maxbits is the maximum number of bits in an error. CBFlib then assigns to each symbol a bit code, using a shorter bit code for the more common symbols and a longer bit code for the less common symbols. The bit-code lengths are calculated using a Huffman-type algorithm and the actual bit codes are constructed using the canonical-code algorithm described by Moffat et al. (1997).

The structure of the compressed data is described in Table 5.6.3.1.

Table 5.6.3.1 | top | pdf |
Structure of compressed data using the canonical-code scheme

Byte	Value
1 to 8	Number of elements (64-bit little-endian number)
9 to 16	Minimum element
17 to 24	Maximum element
25 to 32	(Reserved for future use)
33	Number of bits directly coded, n
34	Maximum number of bits encoded, maxbits
35 to 35 + 2ⁿ − 1	Number of bits in each direct code
35 + 2ⁿ	Number of bits in the stop code
35 + 2ⁿ + 1 to 35 + 2ⁿ + maxbits − n	Number of bits in each indirect code
35 + 2ⁿ + maxbits − n + 1…	Coded data

References

Moffat, A., Bell, T. C. & Witten, I. H. (1997). Lossless compression for text and images. Int. J. High Speed Electron. Syst. 8, 179–231.Google Scholar

International Tables for Crystallography (2006). Vol. G. ch. 5.6, p. 552