The statistical theory of phase determination

Bricogne, G.

doi:10.1107/97809553602060000551

International
Tables for
Crystallography
Volume B
Reciprocal space
Edited by U. Shmueli

pdf | chapter contents | chapter index | related articles

International Tables for Crystallography (2006). Vol. B. ch. 1.3, pp. 96-98 | 1 | 2 |

Section 1.3.4.5.2.2. The statistical theory of phase determination

G. Bricogne^a

^a MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, England, and LURE, Bâtiment 209D, Université Paris-Sud, 91405 Orsay, France

1.3.4.5.2.2. The statistical theory of phase determination

| top | pdf |

The methods of probability theory just surveyed were applied to various problems formally similar to the crystallographic phase problem [e.g. the `problem of the random walk' of Pearson (1905)] by Rayleigh (1880, 1899, 1905, 1918, 1919) and Kluyver (1906). They became the basis of the statistical theory of communication with the classic papers of Rice (1944, 1945).

The Gram–Charlier and Edgeworth series were introduced into crystallography by Bertaut (1955a,b,c, 1956a) and by Klug (1958), respectively, who showed them to constitute the mathematical basis of numerous formulae derived by Hauptman & Karle (1953). The saddlepoint approximation was introduced by Bricogne (1984) and was shown to be related to variational methods involving the maximization of certain entropy criteria. This connection exhibits most of the properties of the Fourier transform at play simultaneously, and will now be described as a final illustration.

(a) Definitions and conventions

Let H be a set of unique non-origin reflections h for a crystal with lattice Λ and space group G. Let H contain $[n_{a}]$ acentric and $[n_{c}]$ centric reflections. Structure-factor values attached to all reflections in H will comprise $[n = 2n_{a} + n_{c}]$ real numbers. For h acentric, $[\alpha_{{\bf h}}]$ and $[\beta_{{\bf h}}]$ will be the real and imaginary parts of the complex structure factor; for h centric, $[\gamma_{{\bf h}}]$ will be the real coordinate of the (possibly complex) structure factor measured along a real axis rotated by one of the two angles $[\theta_{{\bf h}}]$ , π apart, to which the phase is restricted modulo $[2\pi]$ (Section 1.3.4.2.2.5). These n real coordinates will be arranged as a column vector containing the acentric then the centric data, i.e. in the order $[\alpha_{1}, \beta_{1}, \alpha_{2}, \beta_{2}, \ldots, \alpha_{n_{a}}, \beta_{n_{a}}, \gamma_{1}, \gamma_{2}, \ldots, \gamma_{n_{c}}.]$
(b) Vectors of trigonometric structure-factor expressions

Let $[\boldxi({\bf x})]$ denote the vector of trigonometric structure-factor expressions associated with $[{\bf x} \in D]$ , where D denotes the asymmetric unit. These are defined as follows: $[\let\normalbaselines\relax\openup2pt\matrix{\alpha_{{\bf h}} ({\bf x}) + i\beta_{{\bf h}} ({\bf x}) = \Xi ({\bf h},{\bf x})\hfill & \hbox{for } {\bf h} \hbox{ acentric}\hfill\cr \gamma_{{\bf h}} ({\bf x}) = \exp(- i\theta_{{\bf h}}) \Xi ({\bf h},{\bf x})\hfill & \hbox{for } {\bf h} \hbox{ centric},\hfill}]$ where $[\Xi ({\bf h},{\bf x}) = {1 \over |G_{{\bf x}}|} \sum\limits_{g \in G} \exp \{2 \pi i{\bf h} \cdot [S_{g} ({\bf x})]\}.]$

According to the convention above, the coordinates of $[\boldxi ({\bf x})]$ in $[{\bb R}^{n}]$ will be arranged in a column vector as follows: $[\eqalign{\boldxi_{2r - 1} ({\bf x}) &= \alpha_{{\bf h}_{r}} ({\bf x}) \quad \hbox{for } r = 1, \ldots, n_{a},\cr \boldxi_{2r} ({\bf x}) &= \beta_{{\bf h}_{r}} ({\bf x}) \quad \hbox{for } r = 1, \ldots, n_{a},\cr \boldxi_{n_{a} + r} ({\bf x}) &= \gamma_{{\bf h}_{r}} ({\bf x}) \quad \hbox{for } r = n_{a} + 1, \ldots, n_{a} + n_{c}.}]$
(c) Distributions of random atoms and moment-generating functions

Let position x in D now become a random vector with probability density $[m({\bf x})]$ . Then $[\boldxi ({\bf x})]$ becomes itself a random vector in $[{\bb R}^{n}]$ , whose distribution $[p (\boldxi)]$ is the image of distribution $[m ({\bf x})]$ through the mapping $[{\bf x} \rightarrow \boldxi ({\bf x})]$ just defined. The locus of $[\boldxi ({\bf x})]$ in $[{\bb R}^{n}]$ is a compact algebraic manifold $[{\scr L}]$ (the multidimensional analogue of a Lissajous curve), so that p is a singular measure (a distribution of order 0, Section 1.3.2.3.4, concentrated on that manifold) with compact support. The average with respect to p of any function Ω over $[{\bb R}^{n}]$ which is infinitely differentiable in a neighbourhood of $[{\scr L}]$ may be calculated as an average with respect to m over D by the `induction formula': $[\langle p, \Omega \rangle = {\textstyle\int\limits_{D}} m ({\bf x}) \Omega [\boldxi ({\bf x})] \hbox{ d}^{3} {\bf x}.]$

In particular, one can calculate the moment-generating function M for distribution p as $[M ({\bf t}) \equiv \langle p_{\boldxi}, \exp ({\bf t} \cdot {\boldxi})\rangle = {\textstyle\int\limits_{D}} m ({\bf x}) \exp [{\bf t} \cdot {\boldxi} ({\bf x})] \hbox{ d}^{3} {\bf x}]$ and hence calculate the moments μ (respectively cumulants κ) of p by differentiation of M (respectively log M) at $[{\bf t} = {\bf 0}]$ : $[\eqalign{\mu_{r_{1} r_{2} \ldots r_{n}} &\equiv {\int\limits_{D}} m({\bf x}) \boldxi_{1}^{r_{1}} ({\bf x}) \boldxi_{2}^{r_{2}} ({\bf x}) \ldots \boldxi_{n}^{r_{n}} ({\bf x}) \hbox{ d}^{3} {\bf x}\cr &= {\partial^{r_{1} + \ldots + r_{n}} (M) \over \partial t_{1}^{r_{1}} \ldots \partial t_{n}^{r_{n}}}\cr \kappa_{r_{1} r_{2} \ldots r_{n}} &= {\partial^{r_{1} + \ldots + r_{n}} (\log M) \over \partial t_{1}^{r_{1}} \ldots \partial t_{n}^{r_{n}}}.}]$ The structure-factor algebra for group G (Section 1.3.4.2.2.9) then allows one to express products of $[\boldxi]$ 's as linear combinations of other $[\boldxi]$ 's, and hence to express all moments and cumulants of distribution $[p(\boldxi)]$ as linear combinations of real and imaginary parts of Fourier coefficients of the prior distribution of atoms $[m({\bf x})]$ . This plays a key role in the use of non-uniform distributions of atoms.
(d) The joint probability distribution of structure factors

In the random-atom model of an equal-atom structure, N atoms are placed randomly, independently of each other, in the asymmetric unit D of the crystal with probability density $[m({\bf x})]$ . For point atoms of unit weight, the vector F of structure-factor values for reflections $[{\bf h} \in H]$ may be written $[{\bf F} = {\textstyle\sum\limits_{I = 1}^{N}}\; {\boldxi}^{[I]},]$ where the N copies $[\boldxi^{[I]}]$ of random vector ξ are independent and have the same distribution $[p(\boldxi)]$ .

The joint probability distribution $[{\scr P}({\bf F})]$ is then [Section 1.3.4.5.2.1(e)] $[{\scr P}({\bf X}) = {1 \over (2 \pi)^{n}} \int\limits_{{\bb R}^{n}} \exp [N \log M (i{\bf t}) - i{\bf t} \cdot {\bf X}] \hbox{ d}^{n} {\bf t}.]$

For low dimensionality n it is possible to carry out the Fourier transformation numerically after discretization, provided $[M (i{\bf t})]$ is sampled sufficiently finely that no aliasing results from taking its Nth power (Barakat, 1974). This exact approach can also accommodate heterogeneity, and has been used first in the field of intensity statistics (Shmueli et al., 1984, 1985; Shmueli & Weiss, 1987, 1988), then in the study of the $[\Sigma_{1}]$ and $[\Sigma_{2}]$ relations in triclinic space groups (Shmueli & Weiss, 1985, 1986). Some of these applications are described in Chapter 2.1 of this volume. This method could be extended to the construction of any joint probability distribution (j.p.d.) in any space group by using the generic expression for the moment-generating function (m.g.f.) derived by Bricogne (1984). It is, however, limited to small values of n by the necessity to carry out n-dimensional FFTs on large arrays of sample values.

The asymptotic expansions of Gram–Charlier and Edgeworth have good convergence properties only if $[F_{{\bf h}}]$ lies in the vicinity of $[\langle F_{{\bf h}}\rangle = N \bar{{\scr F}}[m]({\bf h})]$ for all $[{\bf h} \in H]$ . Previous work on the j.p.d. of structure factors has used for $[m({\bf x})]$ a uniform distribution, so that $[\langle {\bf F}\rangle = {\bf 0}]$ ; as a result, the corresponding expansions are accurate only if all moduli $[|F_{{\bf h}}|]$ are small, in which case the j.p.d. contains little phase information.

The saddlepoint method [Section 1.3.4.5.2.1(f)] constitutes the method of choice for evaluating the joint probability $[{\scr P}({\bf F}^{*})]$ of structure factors when some of the moduli in $[{\bf F}^{*}]$ are large. As shown previously, this approximation amounts to using the `conjugate distribution' $[p_{{\boldtau}} ({\boldxi}) = p({\boldxi}) {\exp ({\boldtau} \cdot {\boldxi}) \over M (\boldtau)}]$ instead of the original distribution $[p({\boldxi}) = p_{{\bf 0}} ({\boldxi})]$ for the distribution of random vector ξ. This conjugate distribution $[p_{{\boldtau}}]$ is induced from the modified distribution of atoms $[q_{{\boldtau}} ({\bf x}) = m({\bf x}) {\exp [{\boldtau} \cdot {\boldxi} ({\bf x})] \over M (\boldtau)}, \eqno(\hbox{SP}1)]$ where, by the induction formula, $[M (\boldtau)]$ may be written as $[M ({\boldtau}) = {\textstyle\int\limits_{D}} m({\bf x}) \exp [{\boldtau} \cdot {\boldxi} ({\bf x})] \hbox{ d}^{3} {\bf x} \eqno(\hbox{SP}2)]$ and where τ is the unique solution of the saddlepoint equation: $[\nabla_{{\boldtau}} (\log M^{N}) = {\bf F}^{*}. \eqno(\hbox{SP}3)]$ The desired approximation is then $[{\scr P}^{\rm SP} ({\bf F}^{*}) = {\exp ({\sf S}) \over \sqrt{\det (2 \pi {\scr Q})}},]$ where $[{\sf S} = \log M^{N} ({\boldtau}) - {\boldtau} \cdot {\bf F}^{*}]$ and where $[{\scr Q} = \nabla \nabla^{T} (\log M^{N}) = {\bf NQ}.]$

Finally, the elements of the Hessian matrix $[{\bf Q} = \nabla \nabla^{T} (\log M)]$ are just the trigonometric second-order cumulants of distribution p, and hence can be calculated via structure-factor algebra from the Fourier coefficients of $[q_{{\boldtau}} ({\bf x})]$ . All the quantities involved in the expression for $[{\scr P}^{\rm SP} ({\bf F}^{*})]$ are therefore effectively computable from the initial data $[m({\bf x})]$ and $[{\bf F}^{*}]$ .
(e) Maximum-entropy distributions of atoms

One of the main results in Bricogne (1984) is that the modified distribution $[q_{{\boldtau}} ({\bf x})]$ in (SP1) is the unique distribution which has maximum entropy $[{\scr S}_{m} (q)]$ relative to $[m({\bf x})]$ , where $[{\scr S}_{m} (q) = - \int\limits_{D} q({\bf x}) \log \left[{q({\bf x}) \over m({\bf x})}\right] \hbox{d}^{3} {\bf x},]$ under the constraint that $[{\bf F}^{*}]$ be the centroid vector of the corresponding conjugate distribution $[{\scr P}_{{\boldtau}} ({\bf F})]$ . The traditional notation of maximum-entropy (ME) theory (Jaynes, 1957, 1968, 1983) is in this case (Bricogne, 1984) $[\eqalignno{q^{\rm ME} ({\bf x}) &= m({\bf x}) {\exp [\boldlambda \cdot {\boldxi} ({\bf x})] \over Z (\boldlambda)}&(\hbox{ME}1)\cr Z (\boldlambda) &= {\textstyle\int\limits_{D}} m({\bf x}) \exp [\boldlambda \cdot {\boldxi} ({\bf x})] \hbox{ d}^{3} {\bf x} &(\hbox{ME}2)\cr \nabla_{\lambda} (\log Z^{N}) &= {\bf F}^{*} &(\hbox{ME}3)\cr}]$ so that Z is identical to the m.g.f. M, and the coordinates $[\boldtau]$ of the saddlepoint are the Lagrange multipliers λ for the constraints $[{\bf F}^{*}]$ .

Jaynes's ME theory also gives an estimate for $[{\scr P}({\bf F}^{*})]$ : $[{\scr P}^{\rm ME} ({\bf F}^{*}) \approx \exp ({\scr S}),]$ where $[{\scr S} = \log Z^{N} - \boldlambda \cdot {\bf F}^{*} = N {\scr S}_{m} ({\bf q}^{\rm ME})]$ is the total entropy and is the counterpart to $[{\sf S}]$ under the equivalence just established.

$[{\scr P}^{\rm ME}]$ is identical to $[{\scr P}^{\rm SP}]$ , but lacks the denominator. The latter, which is the normalization factor of a multivariate Gaussian with covariance matrix $[{\scr Q}]$ , may easily be seen to arise through Szegö's theorem (Sections 1.3.2.6.9.4, 1.3.4.2.1.10) from the extra logarithmic term in Stirling's formula $[\log (q!) \approx q \log q - q + {\textstyle{1 \over 2}} \log (2 \pi q)]$ (see, for instance, Reif, 1965) beyond the first two terms which serve to define entropy, since $[{1 \over n} \log \det (2 \pi {\bf Q}) \approx {\int\limits_{{\bb R}^{3}/{\bb Z}^{3}}} \log 2 \pi q^{\rm ME} ({\bf x}) \hbox{ d}^{3} {\bf x}.]$ The relative effect of this extra normalization factor depends on the ratio $[{n \over N} = {\hbox{dimension of {\bf F} over {\bb R}} \over \hbox{number of atoms}}.]$

The above relation between entropy maximization and the saddlepoint approximation is the basis of a Bayesian statistical approach to the phase problem (Bricogne, 1988) where the assumptions under which joint distributions of structure factors are sought incorporate many new ingredients (such as molecular boundaries, isomorphous substitutions, known fragments, noncrystallographic symmetries, multiple crystal forms) besides trial phase choices for basis reflections. The ME criterion intervenes in the construction of $[q^{\rm ME} ({\bf x})]$ under these assumptions, and the distribution $[q^{\rm ME} ({\bf x})]$ is a very useful computational intermediate in obtaining the approximate joint probability $[{\scr P}^{\rm SP} ({\bf F}^{*})]$ and the associated conditional distributions and likelihood functions.

(f) Role of the Fourier transformation

The formal developments presented above make use of the following properties of the Fourier transformation:

(i) the convolution theorem, which turns the convolution of probability distributions into the multiplication of their characteristic functions;
(ii) the differentiation property, which confers moment-generating properties to characteristic functions;
(iii) the reciprocity theorem , which allows the retrieval of a probability distribution from its characteristic or moment-generating function;
(iv) the Paley–Wiener theorem, which allows the analytic continuation of characteristic functions associated to probability distributions with compact support, and thus gives rise to conjugate families of distributions ;
(v) Bertaut's structure-factor algebra (a discrete symmetrized version of the convolution theorem), which allows the calculation of all necessary moments and cumulants when the dimension n is small;
(vi) Szegö's theorem, which provides an asymptotic approximation of the normalization factor when n is large.

This multi-faceted application seems an appropriate point at which to end this description of the Fourier transformation and of its use in crystallography.

References

Barakat, R. (1974). First-order statistics of combined random sinusoidal waves with applications to laser speckle patterns. Opt. Acta, 21, 903–921.Google Scholar

Bertaut, E. F. (1955a). La méthode statistique en cristallographie. I. Acta Cryst. 8, 537–543.Google Scholar

Bertaut, E. F. (1955b). La méthode statistique en cristallographie. II. Quelques applications. Acta Cryst. 8, 544–548.Google Scholar

Bertaut, E. F. (1955c). Fonction de répartition: application à l'approache directe des structures. Acta Cryst. 8, 823–832.Google Scholar

Bertaut, E. F. (1956a). Les groupes de translation non primitifs et la méthode statistique. Acta Cryst. 9, 322.Google Scholar

Bricogne, G. (1984). Maximum entropy and the foundations of direct methods. Acta Cryst. A40, 410–445.Google Scholar

Bricogne, G. (1988). A Bayesian statistical theory of the phase problem. I. A multichannel maximum entropy formalism for constructing generalised joint probability distributions of structure factors. Acta Cryst. A44, 517–545.Google Scholar

Hauptman, H. & Karle, J. (1953). Solution of the phase problem. I. The centrosymmetric crystal. ACA Monograph No. 3. Pittsburgh:Polycrystal Book Service.Google Scholar

Jaynes, E. T. (1957). Information theory and statistical mechanics. Phys. Rev. 106, 620–630.Google Scholar

Jaynes, E. T. (1968). Prior probabilities. IEEE Trans. SSC, 4, 227–241.Google Scholar

Jaynes, E. T. (1983). Papers on probability, statistics and statistical physics. Dordrecht: Kluwer Academic Publishers.Google Scholar

Klug, A. (1958). Joint probability distributions of structure factors and the phase problem. Acta Cryst. 11, 515–543.Google Scholar

Kluyver, J. C. (1906). A local probability problem. K. Ned. Akad. Wet. Proc. 8, 341–350.Google Scholar

Pearson, K. (1905). The problem of the random walk. Nature (London), 72, 294, 342.Google Scholar

Rayleigh (J. W. Strutt), Lord (1880). On the resultant of a large number of vibrations of the same pitch and arbitrary phase. Philos. Mag. 10, 73–78.Google Scholar

Rayleigh (J. W. Strutt), Lord (1899). On James Bernoulli's theorem in probabilities. Philos. Mag. 47, 246–251.Google Scholar

Rayleigh (J. W. Strutt), Lord (1905). The problem of the random walk. Nature (London), 72, 318.Google Scholar

Rayleigh (J. W. Strutt), Lord (1918). On the light emitted from a random distribution of luminous sources. Philos. Mag. 36, 429–449.Google Scholar

Rayleigh (J. W. Strutt), Lord (1919). On the problem of random flights in one, two or three dimensions. Philos. Mag. 37, 321–347.Google Scholar

Reif, F. (1965). Fundamentals of statistical and thermal physics, Appendix A.6. New York: McGraw-Hill.Google Scholar

Rice, S. O. (1944, 1945). Mathematical analysis of random noise. Bell Syst. Tech. J. 23, 283–332 (parts I and II); 24, 46–156 (parts III and IV). [Reprinted in Selected papers on noise and stochastic processes (1954), edited by N. Wax, pp. 133–294. New York: Dover Publications.]Google Scholar

Shmueli, U. & Weiss, G. H. (1985). Exact joint probability distribution for centrosymmetric structure factors. Derivation and application to the $[\Sigma_{1}]$ relationship in the space group $[P\bar{1}]$ . Acta Cryst. A41, 401–408.Google Scholar

Shmueli, U. & Weiss, G. H. (1986). Exact joint distribution of $[E_{\bf h}]$ , $[E_{\bf k}]$ and $[E_{\bf h+k}]$ , and the probability for the positive sign of the triple product in the space group $[P{\bar {1}}]$ . Acta Cryst. A42, 240–246.Google Scholar

Shmueli, U. & Weiss, G. H. (1987). Exact random-walk models in crystallographic statistics. III. Distributions of for space groups of low symmetry. Acta Cryst. A43, 93–98.Google Scholar

Shmueli, U. & Weiss, G. H. (1988). Exact random-walk models in crystallographic statistics. IV. P.d.f.'s of allowing for atoms in special positions. Acta Cryst. A44, 413–417.Google Scholar

Shmueli, U., Weiss, G. H. & Kiefer, J. E. (1985). Exact random-walk models in crystallographic statistics. II. The bicentric distribution for the space group $[P{\bar {1}}]$ . Acta Cryst. A41, 55–59.Google Scholar

Shmueli, U., Weiss, G. H., Kiefer, J. E. & Wilson, A. J. C. (1984). Exact random-walk models in crystallographic statistics. I. Space groups $[P{\bar {\it 1}}]$ and P1. Acta Cryst. A40, 651–660.Google Scholar

International Tables for Crystallography (2006). Vol. B. ch. 1.3, pp. 96-98