Definitions

Prince, E.; Boggs, P. T.

doi:10.1107/97809553602060000609

International
Tables for
Crystallography
Volume C
Mathematical, physical and chemical tables
Edited by E. Prince

pdf | chapter contents | chapter index | related articles

International Tables for Crystallography (2006). Vol. C. ch. 8.1, pp. 678-680

Section 8.1.1. Definitions

E. Prince^a and P. T. Boggs^b

^a NIST Center for Neutron Research, National Institute of Standards and Technology, Gaithersburg, MD 20899, USA, and ^bScientific Computing Division, National Institute of Standards and Technology, Gaithersburg, MD 20899, USA

8.1.1. Definitions

| top | pdf |

8.1.1.1. Linear algebra

| top | pdf |

A matrix is an ordered, rectangular array of numbers, real or complex. Matrices will be denoted by upper-case, bold italic letters, A. Their individual elements will be denoted by upper-case, italic letters with subscripts. A_ij denotes the element in the ith row and the jth column of A. A matrix with only one row is a row vector; a matrix with only one column is a column vector. Vectors will be denoted by lower-case, bold roman letters, and their elements will be denoted by lower-case, italic letters with single subscripts. Scalar constants will usually be denoted by lower-case, Greek letters.

A matrix with the same number of rows as columns is square. If A_ij = 0 for all $[i \, \gt \, j]$ , A is upper triangular. If A_ij = 0 for all $[i \, \lt \, j]$ , A is lower triangular. If A_ij = 0 for all $[i \neq j]$ , A is diagonal. If A_ij = 0 for all i and j, A is null. A matrix, B, such that B_ij = A_ji for all i and j is the transpose of A, and is denoted by A^T. Matrices with the same dimensions may be added and subtracted: (A+ B)_ij = A_ij + B_ij. A matrix may be multiplied by a scalar: (αA)_ij = αA_ij. Multiplication of matrices is defined by (AB)_ij = $[\sum _{k=1}^m A_{ik}B_{kj}]$ , where m is the number of columns of A and the number of rows of B (which must be equal). Addition and multiplication of matrices obey the associative law: (A + B) + C = A + (B + C); (AB)C = A(BC). Multiplication of matrices obeys the distributive law: A(B + C) = AB + AC. Addition of matrices obeys the commutative law: A + B = B + A, but multiplication, except in certain (important) special cases, does not: AB $[\neq]$ BA. The transpose of a product is the product of the transposes of the factors in reverse order: (AB)^T= B^TA^T.

The trace of a square matrix is the sum of its diagonal elements. The determinant of an $[n\times n]$ square matrix, A, denoted by |A|, is the sum of [n!] terms, each of which is a product of the diagonal elements of a matrix derived from A by permuting columns or rows (see Stewart, 1973). The rank of a matrix (not necessarily square) is the dimension of the largest square submatrix that can be formed from it, by selecting rows and columns, whose determinant is not equal to zero. A matrix has full column rank if its rank is equal to its number of columns. A square matrix whose diagonal elements are equal to one and whose off-diagonal elements are equal to zero is an identity matrix, denoted by I. If $[|{\bi A}|\neq 0]$ , A is nonsingular, and there exists a matrix A⁻¹, the inverse of A, such that AA⁻¹ = A⁻¹A = I. If |A| = 0, A is singular, and has no inverse. The adjoint, or conjugate transpose, of A is a matrix, $[{\bi A}^{\dag }]$ , such that $[A_{ij}^{\dag }]$ = $[A_{ji}^{*}]$ , where the asterisk indicates complex conjugate. If $[{\bi A}^{\dag }={\bi A}^{-1}]$ , A is unitary. If the elements of a unitary matrix are real, it is orthogonal. From this definition, if A is orthogonal, it follows that $[\textstyle\sum\limits_{i=1}^nA_{ij}^2=1]$ for all j, and $[\textstyle\sum\limits _{i=1}^nA_{ij}A_{ik}=0 ]$ if $[j\neq k]$ . By analogy, two column vectors, x and y, are said to be orthogonal if x^Ty = 0.

For any square matrix, A, there exists a set of vectors, $[{\bf x}_i]$ , such that $[{\bi A}{\bf x}_i=\lambda _i{\bf x}_i]$ , where $[\lambda _i]$ is a scalar. The values $[\lambda _i]$ are the eigenvalues of A, and the vectors $[{\bf x}_i]$ are the corresponding eigenvectors. If $[{\bi A} ={\bi A}^{\dag }]$ , A is Hermitian, and, if the elements are real, A = A^T, so that A is symmetric. It can be shown (see, for example, Stewart, 1973) that, if A is Hermitian, all eigenvalues are real, and there exists a unitary matrix, T, such that $[{\bi D} = {\bi T}^{\dag }{\bi AT}]$ is diagonal, with the elements of D equal to the eigenvalues of A, and the columns of T are the eigenvectors. An n × n symmetric matrix therefore has n mutually orthogonal eigenvectors. If the product x^TAx is greater than (or equal to) zero for any non-null vector, x, A is positive (semi)definite. Because x may be, in particular, an eigenvector, all eigenvalues of a positive (semi)definite matrix are greater than (or equal to) zero. Any matrix of the form B^TB is positive semi definite, and, if B has full column rank, A = B^TB is positive definite. If A is positive definite, there exists an upper triangular matrix, R, or, equivalently, a lower triangular matrix, L, with positive diagonal elements, such that R^TR = LL^T= A. R, or L, is called the Cholesky factor of A. The magnitude, length or Euclidean norm of a vector, x, denoted by $[\left \| {\bf x}\right \|]$ , is defined by $[\left \| {\bf x}\right \| =({\bf x}{^T}{\bf x})^{1/2}]$ . The induced matrix norm of a matrix, B, denoted $[\|{\bi B}\|]$ , is defined as the maximum value of $[\|{\bi B}{\bf x}\|/\| {\bf x}\|=({\bf x}\vphantom X^T{\bi B}^T{\bi B}{\bf x/{\bf x}}^T{\bf x)}^{1/2}]$ for $[\| {\bf x}\| \gt 0]$ . Because x^TB^TBx will have its maximum value for a fixed value of x^Tx when x is parallel to the eigenvector that corresponds to the largest eigenvalue of B^TB, this definition implies that ||B|| is equal to the square root of the largest eigenvalue of B^TB. The condition number of B is the square root of the ratio of the largest and smallest eigenvalues of B^TB. (Other definitions of norms exist, with corresponding definitions of condition number. We shall not be concerned with any of these.)

We shall make extensive use of the so-called QR decomposition, which is defined as follows: For any $[n\times p\ (n\geq p)]$ real matrix, Z, there exists an n × n orthogonal matrix, Q, such that $[{\bi Q}^T{\bi Z}=\left ({{\bi R} \atop {\bi O}}\right), \eqno (8.1.1.1)]$ where R is a p × p upper triangular matrix, and O denotes an (n − p) × p null matrix. Thus, we have $[{\bi Z}={\bi Q}\left ({{\bi R} \atop {\bi O}}\right), \eqno (8.1.1.2)]$ which is known as the QR decomposition of Z. If we partition Q as $[({\bi Q}_{\bf Z}, {\bi Q}_{\bot })]$ , where Q_Z has dimensions n × p, and $[{\bi Q}_{\bot }]$ has dimensions n × (n − p), (8.1.1.2) becomes $[{\bi Z}={\bi Q}_{{\bi Z}}{\bi R}, \eqno (8.1.1.3)]$ which is known as the QR factorization. We shall make use of the following facts. First, R is nonsingular if and only if the columns of Z are linearly independent; second, the columns of Q_Z form an orthonormal basis for the range space of Z, that is, they span the same space as Z; and, third, the columns of $[{\bi Q}_{\bot }]$ form an orthonormal basis for the null space of Z^T, that is, Z^T $[{\bi Q}_{\bot }]$ = O.

There are two common procedures for computing the QR factorization. The first makes use of Householder transformations, which are defined by $[{\bi H}={\bi I}-2{\bf x{\bf x}}^T, \eqno (8.1.1.4)]$ where x^Tx = 1. H is symmetric, and H² = I, so that H is orthogonal. In three dimensions, H corresponds to a reflection in a mirror plane perpendicular to x, because of which Stewart (1973) has suggested the alternative term elementary reflector. A vector v is transformed by Hv into the vector $[\left \| {\bf v}\right \| {\bf e}]$ , where e represents a vector with e₁ = 1, and e_i = 0 for $[i\neq 1]$ , if $[{\bf x}=[{\bf v}-\left \| {\bf v}\right \| {\bf e}]\big/\left \| [{\bf v}-\left \| {\bf v}\right \| {\bf e}]\right \|. \eqno (8.1.1.5)]$ The factorization procedure for an n × p matrix, A (Stewart, 1973; Anderson et al., 1992), takes as v in the first step the first column of A, and forms A₁ = H₁A, which has zeros in all elements of the first column below the diagonal. In the second step, v has a zero as the first element and is filled out by those elements of the second column of A₁ on or below the diagonal. A₂ = H₂A₁ then has zeros in all elements below the diagonal in the first two columns. This process is repeated (p − 2) more times, after which Q = H_p $[\ldots ]$ H₂H₁, and R = A_p is upper triangular.

QR factorization by Householder transformations requires for efficiency that the entire n × p matrix be stored in memory, and requires of order np² operations. A procedure that requires storage of only the upper triangle makes use of Givens rotations, which are 2 × 2 matrices of the form $[{\bi G}=\left (\matrix{ \cos \alpha &\sin \alpha \cr -\sin \alpha &\cos \alpha }\right). \eqno (8.1.1.6)]$ Multiplication of a $[2\times m]$ matrix, B, by G will put a zero in the $[B_{21}]$ element if $[\alpha =\arctan B_{21}/B_{11}]$ . The factorization of A involves reading, or computing, the rows of A one at a time. In the first step, matrix B₁ consists of the first row of R and the current row of A, from which the first element is eliminated. In the second step, B₂₁ is the second row of R and the (p − 1) non-zero elements of the second row of the transformed B₁. After the first p rows have been treated, each additional row of A requires 2p(p + 1) multiplications to fill it with zeros. However, because the operation is easily vectorized, the time required may be a small proportion of the total computing time on a vector oriented computer.

8.1.1.2. Statistics

| top | pdf |

A probability density function, which will be abbreviated p.d.f., is a function, Φ(x), such that the probability of finding the random variable x in the interval $[a\leq x\leq b]$ is given by $[p(a\leq x\leq b)=\textstyle\int\limits _a^b\Phi (x){\,{\rm d}}x. ]$ A p.d.f. has the properties $[\Phi (x)\geq 0,\qquad -\infty \, \lt \, x \, \lt \, +\infty, ]$ and $[\textstyle\int\limits _{-\infty }^{+\infty }\Phi (x){\,{\rm d}}x=1. ]$ A cumulative distribution function, which will be abbreviated c.d.f., is defined by $[\Psi (x)=\textstyle\int\limits _{-\infty }^x\Phi (t){\,{\rm d}}t. ]$ The properties of Φ(x) imply that $[0\leq \Psi (x)\leq 1]$ , and Φ(x) = dΨ(x)/dx. The expected value of a function, f(x), of random variable x is defined by $[\left \langle f(x)\right \rangle =\textstyle\int\limits _{-\infty }^{+\infty }f(x)\Phi(x){\rm d}x. ]$ If f(x) = xⁿ, $[\left \langle f(x)\right \rangle =\left \langle x^n\right \rangle ]$ is the nth moment of Φ(x). The first moment, often denoted by μ, is the mean of Φ(x). The second moment about the mean, $[\left \langle (x-\left \langle x\right \rangle)^2\right \rangle ]$ , usually denoted by σ², is the variance of $[\Phi \left (x\right) ]$ . The positive square root of the variance is the standard deviation.

For a vector, x, of random variables, $[x_{1},x_{2},\ldots,x_{n}]$ , the joint probability density function, or joint p.d.f., is a function, Φ_J(x), such that $[\eqalignno{p(&a_{1} \leq x_{1}\leq b_{1}\semi \, a_{2}\leq x_{2}\leq b_{2};\ldots; \ a_{n}\leq x_{n}\leq b_{n}) \cr&=\textstyle\int\limits _{a_{1}}^{b_{1}}\textstyle\int\limits_{a_{2}}^{b_{2}}\ldots\textstyle \int\limits_{a_{n}}^{b_{n}}\Phi _{J}\left ({\bf x}\right) {\,{\rm d}}x_{1}{\,{\rm d}}x_{2}\ldots {\,{\rm d}}x_{n}.& (8.1.1.7)}]$ The marginal p.d.f. of an element (or a subset of elements), $[x_{i}]$ , is a function, $[\Phi _{M}(x_{i})]$ , such that $[\eqalignno{p(a_{i} \leq x_{i}\leq b_{i})&=\textstyle\int\limits _{a_{i}}^{b_{i}}\Phi _{M}(x_{i}){\,{\rm d}}x_i \cr &=\textstyle\int\limits_{-\infty }^{+\infty }\ldots \textstyle\int\limits_{a_{i}}^{b_{i}}\ldots \textstyle\int\limits _{-\infty }^{\infty }\Phi _{J}\left ({\bf x}\right) {\,{\rm d}}x_{1}\ldots {\,{\rm d}}x_{i}\ldots {\,{\rm d}}x_{n}. \cr &&(8.1.1.8)}]$ This is a p.d.f. for $[x_{i}]$ alone, irrespective of the values that may be found for any other element of x. For two random variables, x and y (either or both of which may be vectors), the conditional p.d.f. of x given y = y₀ is defined by $[\Phi _{C}(x|y_{0})=c\Phi _{J}(x,y)_{y=y_{0}}, ]$ where $[c=1/\Phi _{M}(y_{0})]$ is a renormalizing factor. This is a p.d.f. for x when it is known that y = y₀. If $[\Phi _{C}\left (x|y\right) =\Phi _{M}\left (x\right)]$ for all y, or, equivalently, if $[\Phi _{J}(x,y)=\Phi _{M}(x)\Phi _{M}(y)]$ , the random variables x and y are said to be statistically independent.

Moments may be defined for multivariate p.d.f.s in a manner analogous to the one-dimensional case. The mean is a vector defined by $[\mu _i=\left \langle x_i\right \rangle =\textstyle\int x_i\Phi ({\bf x}){\,{\rm d}}{\bf x}, ]$ where the volume of integration is the entire domain of x. The variance–covariance matrix is defined by $[\eqalignno{V_{ij} &=\left \langle \left (x_i-\langle x_i\rangle\right)\left(x_j-\langle x_j\rangle\right ) \right \rangle \cr &=\textstyle\int \left(x_i-\langle x_i\rangle\right) \left(x_j-\langle x_j\rangle\right) \Phi _J({\bf x}){\,{\rm d}}{\bf x}.& (8.1.1.9)}]$ The diagonal elements of V are the variances of the marginal p.d.f.s of the elements of x, that is, $[V_{ii}=\sigma _i^2]$ . It can be shown that, if [x_i] and [x_j] are statistically independent, $[V_{ij}=0]$ when $[i\neq j]$ . If two vectors of random variables, x and y, are related by a linear transformation, x = By, the means of their joint p.d.f.s are related by μ_x = Bμ_y, and their variance–covariance matrices are related by V_x = BV_yB^T.

References

Anderson, E., Bai, Z., Bischof, C., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A., Ostrouchov, S. & Sorenson, D. (1992). LAPACK user's guide, 2nd ed. Philadelphia: SIAM Publications.Google Scholar

Stewart, G. W. (1973). Introduction to matrix computations. New York: Academic Press.Google Scholar

International Tables for Crystallography (2006). Vol. C. ch. 8.1, pp. 678-680