Probability density distributions – mathematical preliminaries

Shmueli, U.; Wilson, A. J. C.

doi:10.1107/97809553602060000554

International
Tables for
Crystallography
Volume B
Reciprocal space
Edited by U. Shmueli

pdf | chapter contents | chapter index | related articles

International Tables for Crystallography (2006). Vol. B. ch. 2.1, pp. 192-195 | 1 | 2 |

Section 2.1.4. Probability density distributions – mathematical preliminaries

U. Shmueli^a ^* and A. J. C. Wilson^b ^‡

^a School of Chemistry, Tel Aviv University, Tel Aviv 69 978, Israel, and ^bSt John's College, Cambridge, England
Correspondence e-mail: ushmueli@post.tau.ac.il

2.1.4. Probability density distributions – mathematical preliminaries

| top | pdf |

For the purpose of this chapter, `ideal' probability distributions or probability density functions are the asymptotic forms obtained by the use of the central-limit theorem when the number of atoms in the unit cell, N, is sufficiently large. In order to derive them it is necessary to outline the properties of characteristic functions and to state alternative conditions for the validity of the central-limit theorem; the distributions themselves are derived in Section 2.1.5.

2.1.4.1. Characteristic functions

| top | pdf |

The average value of $[\exp(itx)]$ is very important in probability theory; it is called the characteristic function of the distribution [f(x)] and is denoted by $[C_{x}(t)]$ or, when no confusion can arise, by [C(t)] . It exists for all legitimate distributions, whether discrete or continuous. In the continuous case it is given by $[C(t) = \textstyle\int\limits_{-\infty}^{\infty}\exp(itx)f(x)\;{\rm d}x, \eqno(2.1.4.1)]$ and is thus the Fourier transform of [f(x)] . In many cases it can be obtained from known integrals. For example, for the Cauchy distribution, $[\eqalignno{C(t)& = {{a}\over{\pi}}\int_{-\infty}^{\infty}{{\exp(itx)}\over{a^{2}+x^{2}}} \;{\rm d}x &(2.1.4.2) \cr & = \exp(-a|t|), &(2.1.4.3)}%2.1.4.3]$ and for the normal distribution, $[\eqalignno{C(t)& = (2\pi\sigma^{2})^{-1/2}\int_{-\infty}^{\infty}\exp\left(-{(x-m)^{2}}\over{2\sigma^{2}}\right)\exp(itx)\;{\rm d}x \cr &&(2.1.4.4) \cr & = \exp\left(imt-{{\sigma^{2}t^{2}}\over{2}}\right). &(2.1.4.5)}%2.1.4.5]$ Since the characteristic function is the Fourier transform of the distribution function, the converse is true, and if the characteristic function is known the probability distribution function can be obtained by the use of Fourier inversion theorem, $[f(x) = (1/2\pi)\textstyle\int\limits_{-\infty}^{\infty}\exp(-itx)C(t)\;{\rm d}t. \eqno(2.1.4.6)]$ An alternative approach to the derivation of the distribution from a known characteristic function will be discussed below.

The most important property of characteristic functions in crystallography is the following: if x and y are independent random variables with characteristic functions $[C_{x}(t)]$ and $[C_{y}(t)]$ , the characteristic function of their sum $[z = x+y \eqno(2.1.4.7)]$ is the product $[C_{z}(t) = C_{x}(t)C_{y}(t). \eqno(2.1.4.8)]$ Obviously this can be extended to any number of independent random variables.

When the moments exist, the characteristic function can be expanded in a power series in which the kth term is $[m_{k}(it){}^{k}/k!]$ . If the power series $[\exp(itx) = 1+itx+{{(it)^{2}x^{2}}\over{2!}}+{{(it)^{3}x^{3}}\over{3!}}+\ldots \eqno(2.1.4.9)]$ is substituted in equation (2.1.4.1), one obtains $[C(t) = 1+itm_{1}'+{{(it)^{2}m_{2}'}\over{2!}}+{{(it)^{3}m_{3}'}\over{3!}}+ \ldots. \eqno(2.1.4.10)]$ The moments are written with primes in order to indicate that equation (2.1.4.10) is valid for moments about an arbitrary origin as well as for moments about the mean. If the random variable is transformed by a change of origin and scale, say $[y = {{x-a}\over{b}}, \eqno(2.1.4.11)]$ the characteristic function for y becomes $[C_{y}(t) = b\exp(-iat/b)C_{x}(t). \eqno(2.1.4.12)]$

2.1.4.2. The cumulant-generating function

| top | pdf |

A function that is often more useful than the characteristic function is its logarithm, the cumulant-generating function: $[K(t) = \log C(t) = k_{1}+{{k_{2}(it)^{2}}\over{2!}}+{{k_{3}(it)^{3}}\over{3!}}+ \ldots, \eqno(2.1.4.13)]$ where the k's are called the cumulants and may be regarded as being defined by the equation. They can be evaluated in terms of the moments by combining the series (2.1.4.10) for [C(t)] with the ordinary series for the logarithm and equating the coefficients of $[t^{r}]$ . In most cases the process as described is tedious, but it can be shortened by use of a general method [Stuart & Ord (1994), Section 3.14, pp. 87–88; Exercise 3.19, p. 119]. Obviously, the cumulants exist only if the moments exist. The first few relations are $[\eqalignno{k_{0} & = 0 \cr k_{1} & = m_{1}' \cr k_{2} & = m_{2} = m_{2}' - (m_{1}')^{2} &(2.1.4.14)\cr k_{3} & = m_{3} = m_{3}' - 3m_{2}'m_{1}' + 2(m_{1}')^{2} \cr k_{4} & = m_{4} - 3(m_{2})^{2} \cr & = m_{4}'-m_{3}'m_{1}'-3(m_{2}')^{2}+12m_{2}'(m_{1})^{2}-6(m_{1}')^{4}. }]$ Such expressions and their converses up to $[k_{10}]$ are given by Stuart & Ord (1994, pp. 88–91). Since all the cumulants except $[k_{1}]$ can be expressed in terms of the central moments only (i.e., those unprimed), only $[k_{1}]$ is changed by a change of the origin. Because of this property, they are sometimes called the semi-invariants (or seminvariants) of the distribution. Since addition of random variables is equivalent to the multiplication of their characteristic functions [equation (2.1.4.8)] and multiplication of functions is equivalent to the addition of their logarithms, each cumulant of the distribution of the sum of a number of random variables is equal to the sum of the cumulants of the distribution functions of the individual variables – hence the name cumulants. Although the cumulants (except $[k_{1}]$ ) are independent of a change of origin, they are not independent of a change of scale. As for the moments, a change of scale simply multiplies them by a power of the scale factor; if [y = x/b] $[(k_{y})_{r} = (k_{x})_{r}/b^{r}. \eqno(2.1.4.15)]$ The cumulants of the normal distribution are particularly simple. From equation (2.1.4.5), the cumulant-generating function of a normal distribution is $[\eqalignno{K(t) & = imt-{{\sigma^{2}t^{2}}/{2}} &(2.1.4.16) \cr k_{1} & = m &(2.1.4.17) \cr k_{2} & = \sigma^{2}, &(2.1.4.18)}%2.1.4.18]$ all cumulants with $[r \;\gt\; 2]$ are identically zero.

2.1.4.3. The central-limit theorem

| top | pdf |

A simple form of this important theorem can be stated as follows:

If $[x_{1}, x_{2}, \ldots ,x_{n}]$ are independent and identically distributed random variables, each of them having the same mean m and variance $[\sigma^{2}]$ , then the sum $[S_{n}=\textstyle\sum\limits_{j=1}^{n}x_{j} \eqno{(2.1.4.19)}]$ tends to be normally distributed – independently of the distribution(s) of the individual random variables – with mean and variance $[n\sigma^{2}]$ , provided n is sufficiently large.

In order to prove this theorem, let us define a standardized random variable corresponding to the sum [S_n] , i.e., such that its mean is zero and its variance is unity: $[\hat{S}_n={{S_n-nm}\over{\sigma \sqrt{n}}}={{\textstyle\sum_{j=1}^n(x_j-m )}\over{\sigma \sqrt{n}}}\equiv \sum\limits_{j=1}^n{{W_j}\over{\sqrt{n}}}, \eqno(2.1.4.20)]$ where $[W_j=(x_j-m)/\sigma]$ is a standardized single random variable. The characteristic function of $[\hat{S}_n]$ is therefore given by $[\eqalignno{C_n(\hat{S}_n,t) &=\langle \exp (it\hat{S}_n)\rangle &\cr &=\left\langle \exp \left[ it\sum_{j=1}^n{{W_j}\over{\sqrt{n}}}\right] \right\rangle &(2.1.4.21)\cr &=\prod_{j=1}^n\left\langle \exp \left[ it{{W_j}\over{\sqrt{n}}}\right] \right\rangle &(2.1.4.22)\cr &=\left\{ \left\langle \exp \left[ it{{W_{_1}}\over{\sqrt{n}}}\right] \right\rangle \right\} ^n, &(2.1.4.23)\cr}%fd2.1.2.43]$ where the brackets $[\langle\ \rangle]$ denote the operation of averaging with respect to the appropriate probability density function (p.d.f.) [cf. equation (2.1.4.1)]. Equation (2.1.4.22) follows from equation (2.1.4.21) by the assumption of independence, while the assumption of identically distributed variables leads to the identity of the characteristic functions of the individual variables – as seen in equation (2.1.4.23).

On the assumption that moments of all the orders exist – a most plausible assumption in situations usually encountered in structure-factor statistics – we can now expand the characteristic function of a single variable in a power series [cf. equation (2.1.4.10)]: $[\eqalignno{\left\langle \exp \left[ it{{W_{_1}}\over{\sqrt{n}}}\right] \right\rangle &=\left\langle \sum_{r=0}^\infty {{(it)^r}\over{r!}}{{W_1^r}\over{n^{r/2}}} \right\rangle &\cr &=\sum_{r=0}^\infty {{(it)^r}\over{r!}}{{\langle W_1^r\rangle }\over{n^{r/2}}} &\cr &\equiv 1-{{t^2}\over{2n}}+{{\zeta (t,n)}\over{n}}, &(2.1.4.24)\cr}]$ since $[\langle W_1\rangle =0]$ , $[\langle W_1^2\rangle =1,]$ and the quantity denoted by $[\zeta (t,n)]$ in (2.1.4.24) is given by $[\zeta (t,n)=\sum_{r=3}^\infty {{(it)^r}\over{r!}}{{\langle W_1^r\rangle }\over{n^{(r/2)-1}}}. \eqno(2.1.4.25)]$ The characteristic function of $[\hat{S}_n]$ is therefore $[\langle \exp (it\hat{S}_n)\rangle =\left[ 1-{{t^2}\over{2n}}+{{\zeta (t,n)}\over n}\right] ^n. \eqno(2.1.4.26)]$ Now, as is seen from (2.1.4.25), for every fixed t the quantity $[\zeta (t,n)]$ tends to zero as n tends to infinity. The cumulant-generating function of the standardized sum then becomes $[\log C_n(\hat{S}_n,t)=n\log \left[ 1-{1 \over n}\left( {{t^2} \over 2}-\zeta(t,n)\right) \right] \eqno(2.1.4.27)]$ and the logarithm on the right-hand side of equation (2.1.4.27) has the form $[\log (1-z)]$ with $[|z|\rightarrow 0]$ as $[n\rightarrow \infty]$ . We may therefore use the expansion $[\log(1-z)=-\left(z+{z^2\over 2}+{z^3\over 3} + \ldots\right),]$ which is valid for $[|z| \lt 1]$ . We then obtain $[\eqalign{\log C_n(\hat{S}_n,t) &=-n\left[{1\over n}\left({t^2\over 2}-\zeta (t,n)\right) +{1\over 2n^2}\left( {t^2\over 2}-\zeta (t,n)\right) ^2\right.\cr&\left.\quad+{1\over 3n^3}\left( {t^2\over 2}-\zeta (t,n)\right) ^3+\cdots \right] \cr&=-{t^2\over 2}+\zeta (t,n)-{1\over 2n}\left( {t^2\over 2}-\zeta (t,n)\right) ^2\cr&\quad-{1\over 3n^2}\left( {t^2\over 2}-\zeta (t,n)\right) ^3-\cdots}]$ and finally, for every fixed t, $[\lim_{n\rightarrow \infty }\log C_n(\hat{S_n},t)=-{{t^2}\over{2}}. \eqno(2.1.4.28)]$ Since the logarithm is a continuous function of t, it follows directly that $[\lim_{n\rightarrow \infty }C_n(\hat{S_n},t)=\exp \left( -{{t^2}\over2}\right) .\eqno(2.1.4.29)]$ The right-hand side of (2.1.4.29) is just the characteristic function of a standardized normal p.d.f., i.e., a normal p.d.f. with zero mean and unit variance [cf. equation (2.1.4.5)]. The asymptotic expression for the p.d.f. of the standardized sum is therefore obtained as $[p(\hat{S})={1 \over{ \sqrt{2\pi}}}\exp\left(-{{\hat{S}^2}\over 2}\right),]$ which proves the above version of the central-limit theorem.

Surprisingly, this theorem has a very wide applicability and values of n as low as 30 are often large enough for the theorem to be useful. Situations in which the normal p.d.f. must be modified or replaced by an altogether different one are dealt with in Sections 2.1.7 and 2.1.8 of this chapter.

2.1.4.4. Conditions of validity

| top | pdf |

The above outline of a proof of the central-limit theorem depended on the existence of moments of all orders. The components of structure factors always possess finite moments of all orders, but the existence of moments beyond the second is not necessary for the validity of the theorem and it can be proved under much less stringent conditions. In fact, if all the random variables in equation (2.1.4.19) have the same distribution – as in a homoatomic structure – the only requirement is that the second moments of the distributions should exist [the Lindeberg–Lévy theorem (e.g. Cramér, 1951)]. If the distributions are not the same – as in a heteroatomic structure – some further condition is necessary to ensure that no individual random variable dominates the sum. The Liapounoff proof requires the existence of third absolute moments, but this is regarded as aesthetically displeasing; a theorem that ultimately involves only means and variances should require only means and variances in the proof. The Lindeberg–Cramér conditions meet this aesthetic criterion. Roughly, the conditions are that $[S^{2}]$ , the variance of the sum, should tend to infinity and $[\sigma_{j}^{2}/S^{2}]$ , where $[\sigma_{j}^{2}]$ is the variance of the jth random variable, should tend to zero for all j as n tends to infinity. The precise formulation is quoted by Kendall & Stuart (1977, p. 207).

2.1.4.5. Non-independent variables

| top | pdf |

The central-limit theorem, under certain conditions, remains valid even when the variables summed in equation (2.1.4.19) are not independent. The conditions have been investigated by Bernstein (1922, 1927); roughly they amount to requiring that the variables should not be too closely correlated. The theorem applies, in particular, when each $[x_{r}]$ is related to a finite number, [f(n)] , of its neighbours, when the x's are said to be [f(n)] dependent. The [f(n)] dependence seems plausible for crystallographic applications, since the positions of atoms close together in a structure are closely correlated by interatomic forces, whereas those far apart will show little correlation if there is any flexibility in the asymmetric unit when unconstrained. Harker's (1953) idea of `globs' seems equivalent to [f(n)] dependence. Long-range stereochemical effects, as in pseudo-graphitic aromatic hydrocarbons, would presumably produce long-range correlations and make [f(n)] dependence less plausible. If Bernstein's conditions are satisfied, the central-limit theorem would apply, but the actual value of $[\langle x^{2} \rangle - \langle x \rangle^{2}]$ would have to be used for the variance, instead of the sum of the variances of the random variables in (2.1.4.19). Because of the correlations the two values are no longer equal.

French & Wilson (1978) seem to have been the first to appeal explicitly to the central-limit theorem extended to non-independent variables, but many previous workers [for typical references, see Wilson (1981)] tacitly made the replacement – in the X-ray case substituting the local mean intensity for the sum of the squares of the atomic scattering factors.

References

Bernstein, S. (1922). Sur la théorème limite du calcul des probabilités. Math. Ann. 85, 237–241.Google Scholar

Bernstein, S. (1927). Sur l'extension du théorème limite du calcul des probabilités aux sommes de quantités dépendantes. Math. Ann. 97, 1–59.Google Scholar

Cramér, H. (1951). Mathematical methods of statistics. Princeton University Press.Google Scholar

French, S. & Wilson, K. (1978). On the treatment of negative intensity observations. Acta Cryst. A34, 517–525.Google Scholar

Harker, D. (1953). The meaning of the average of $[|F|^{2}]$ for large values of interplanar spacing. Acta Cryst. 6, 731–736.Google Scholar

Kendall, M. & Stuart, A. (1977). The advanced theory of statistics, Vol. 1, 4th ed. London: Griffin.Google Scholar

Stuart, A. & Ord, K. (1994). Kendall's advanced theory of statistics. Vol. 1. Distribution theory, 6th ed. London: Edward Arnold.Google Scholar

Wilson, A. J. C. (1981). Can intensity statistics accommodate stereochemistry? Acta Cryst. A37, 808–810.Google Scholar

International Tables for Crystallography (2006). Vol. B. ch. 2.1, pp. 192-195