International
Tables for
Crystallography
Volume B
Reciprocal space
Edited by U. Shmueli

International Tables for Crystallography (2006). Vol. B. ch. 2.1, pp. 194-195   | 1 | 2 |

Section 2.1.4.3. The central-limit theorem

U. Shmuelia* and A. J. C. Wilsonb

a School of Chemistry, Tel Aviv University, Tel Aviv 69 978, Israel, and bSt John's College, Cambridge, England
Correspondence e-mail:  ushmueli@post.tau.ac.il

2.1.4.3. The central-limit theorem

| top | pdf |

A simple form of this important theorem can be stated as follows:

If [x_{1}, x_{2}, \ldots ,x_{n}] are independent and identically distributed random variables, each of them having the same mean m and variance [\sigma^{2}], then the sum [S_{n}=\textstyle\sum\limits_{j=1}^{n}x_{j} \eqno{(2.1.4.19)}] tends to be normally distributed – independently of the distribution(s) of the individual random variables – with mean [nm] and variance [n\sigma^{2}], provided n is sufficiently large.

In order to prove this theorem, let us define a standardized random variable corresponding to the sum [S_n], i.e., such that its mean is zero and its variance is unity:[\hat{S}_n={{S_n-nm}\over{\sigma \sqrt{n}}}={{\textstyle\sum_{j=1}^n(x_j-m )}\over{\sigma \sqrt{n}}}\equiv \sum\limits_{j=1}^n{{W_j}\over{\sqrt{n}}}, \eqno(2.1.4.20)] where [W_j=(x_j-m)/\sigma] is a standardized single random variable. The characteristic function of [\hat{S}_n] is therefore given by [\eqalignno{C_n(\hat{S}_n,t) &=\langle \exp (it\hat{S}_n)\rangle &\cr &=\left\langle \exp \left[ it\sum_{j=1}^n{{W_j}\over{\sqrt{n}}}\right] \right\rangle &(2.1.4.21)\cr &=\prod_{j=1}^n\left\langle \exp \left[ it{{W_j}\over{\sqrt{n}}}\right] \right\rangle &(2.1.4.22)\cr &=\left\{ \left\langle \exp \left[ it{{W_{_1}}\over{\sqrt{n}}}\right] \right\rangle \right\} ^n, &(2.1.4.23)\cr}%fd2.1.2.43] where the brackets [\langle\ \rangle] denote the operation of averaging with respect to the appropriate probability density function (p.d.f.) [cf. equation (2.1.4.1)[link]]. Equation (2.1.4.22)[link] follows from equation (2.1.4.21)[link] by the assumption of independence, while the assumption of identically distributed variables leads to the identity of the characteristic functions of the individual variables – as seen in equation (2.1.4.23)[link].

On the assumption that moments of all the orders exist – a most plausible assumption in situations usually encountered in structure-factor statistics – we can now expand the characteristic function of a single variable in a power series [cf. equation (2.1.4.10)[link]]: [\eqalignno{\left\langle \exp \left[ it{{W_{_1}}\over{\sqrt{n}}}\right] \right\rangle &=\left\langle \sum_{r=0}^\infty {{(it)^r}\over{r!}}{{W_1^r}\over{n^{r/2}}} \right\rangle &\cr &=\sum_{r=0}^\infty {{(it)^r}\over{r!}}{{\langle W_1^r\rangle }\over{n^{r/2}}} &\cr &\equiv 1-{{t^2}\over{2n}}+{{\zeta (t,n)}\over{n}}, &(2.1.4.24)\cr}] since [\langle W_1\rangle =0], [\langle W_1^2\rangle =1,] and the quantity denoted by [\zeta (t,n)] in (2.1.4.24)[link] is given by[\zeta (t,n)=\sum_{r=3}^\infty {{(it)^r}\over{r!}}{{\langle W_1^r\rangle }\over{n^{(r/2)-1}}}. \eqno(2.1.4.25)] The characteristic function of [\hat{S}_n] is therefore [\langle \exp (it\hat{S}_n)\rangle =\left[ 1-{{t^2}\over{2n}}+{{\zeta (t,n)}\over n}\right] ^n. \eqno(2.1.4.26)] Now, as is seen from (2.1.4.25)[link], for every fixed t the quantity [\zeta (t,n)] tends to zero as n tends to infinity. The cumulant-generating function of the standardized sum then becomes[\log C_n(\hat{S}_n,t)=n\log \left[ 1-{1 \over n}\left( {{t^2} \over 2}-\zeta(t,n)\right) \right] \eqno(2.1.4.27)] and the logarithm on the right-hand side of equation (2.1.4.27)[link] has the form [\log (1-z)] with [|z|\rightarrow 0] as [n\rightarrow \infty]. We may therefore use the expansion[\log(1-z)=-\left(z+{z^2\over 2}+{z^3\over 3} + \ldots\right),] which is valid for [|z| \lt 1]. We then obtain[\eqalign{\log C_n(\hat{S}_n,t) &=-n\left[{1\over n}\left({t^2\over 2}-\zeta (t,n)\right) +{1\over 2n^2}\left( {t^2\over 2}-\zeta (t,n)\right) ^2\right.\cr&\left.\quad+{1\over 3n^3}\left( {t^2\over 2}-\zeta (t,n)\right) ^3+\cdots \right] \cr&=-{t^2\over 2}+\zeta (t,n)-{1\over 2n}\left( {t^2\over 2}-\zeta (t,n)\right) ^2\cr&\quad-{1\over 3n^2}\left( {t^2\over 2}-\zeta (t,n)\right) ^3-\cdots}] and finally, for every fixed t, [\lim_{n\rightarrow \infty }\log C_n(\hat{S_n},t)=-{{t^2}\over{2}}. \eqno(2.1.4.28)] Since the logarithm is a continuous function of t, it follows directly that [\lim_{n\rightarrow \infty }C_n(\hat{S_n},t)=\exp \left( -{{t^2}\over2}\right) .\eqno(2.1.4.29)] The right-hand side of (2.1.4.29)[link] is just the characteristic function of a standardized normal p.d.f., i.e., a normal p.d.f. with zero mean and unit variance [cf. equation (2.1.4.5)[link]]. The asymptotic expression for the p.d.f. of the standardized sum is therefore obtained as[p(\hat{S})={1 \over{ \sqrt{2\pi}}}\exp\left(-{{\hat{S}^2}\over 2}\right),] which proves the above version of the central-limit theorem.

Surprisingly, this theorem has a very wide applicability and values of n as low as 30 are often large enough for the theorem to be useful. Situations in which the normal p.d.f. must be modified or replaced by an altogether different one are dealt with in Sections 2.1.7[link] and 2.1.8[link] of this chapter.








































to end of page
to top of page