International
Tables for Crystallography Volume F Crystallography of biological macromolecules Edited by M. G. Rossmann and E. Arnold © International Union of Crystallography 2006 |
International Tables for Crystallography (2006). Vol. F. ch. 16.2, p. 346
Section 16.2.2.1. Sources of random symbols and the notion of source entropy
aLaboratory of Molecular Biology, Medical Research Council, Cambridge CB2 2QH, England |
Statistical communication theory uses as its basic modelling device a discrete source of random symbols, which at discrete times , randomly emits a `symbol' taken out of a finite alphabet
. Sequences of such randomly produced symbols are called `messages'.
An important numerical quantity associated with such a discrete source is its entropy per symbol H, which gives a measure of the amount of uncertainty involved in the choice of a symbol. Suppose that successive symbols are independent and that symbol i has probability . Then the general requirements that H should be a continuous function of the
, should increase with increasing uncertainty, and should be additive for independent sources of uncertainty, suffice to define H uniquely as
where k is an arbitrary positive constant [Shannon & Weaver (1949)
, Appendix 2] whose value depends on the unit of entropy chosen. In the following we use a unit such that
.
These definitions may be extended to the case where the alphabet is a continuous space endowed with a uniform measure μ: in this case the entropy per symbol is defined as
where q is the probability density of the distribution of symbols with respect to measure μ.
References
