The least-squares plane

Marsh, R. E.; Schomaker, V.

doi:10.1107/97809553602060000560

International
Tables for
Crystallography
Volume B
Reciprocal space
Edited by U. Shmueli

pdf | chapter contents | chapter index | related articles

International Tables for Crystallography (2006). Vol. B. ch. 3.2, pp. 353-359 | 1 | 2 |
https://doi.org/10.1107/97809553602060000560

Chapter 3.2. The least-squares plane

R. E. Marsh^a ^* and V. Schomaker^b ^‡

^aThe Beckman Institute–139–74, California Institute of Technology, 1201 East California Blvd, Pasadena, California 91125, USA, and ^bDepartment of Chemistry, University of Washington, Seattle, Washington 98195, USA
Correspondence e-mail: rem@xray.caltech.edu

The calculation of least-squares planes based on uncorrelated, isotropic weights and of proper least-squares planes with Gaussian weights is described.

Keywords: least-squares planes; planes; weights; standard uncertainties; proper least-squares planes; Gaussian weights; general Gaussian planes.

3.2.1. Introduction

| top | pdf |

By way of introduction, we remark that in earlier days of crystal structure analysis, before the advent of high-speed computers and routine three-dimensional analyses, molecular planarity was often assumed so that atom coordinates along the direction of projection could be estimated from two-dimensional data [see, e.g., Robertson (1948 )]. Today, the usual aim in deriving the coefficients of a plane is to investigate the degree of planarity of a group of atoms as found in a full, three-dimensional structure determination. We further note that, for such purposes, a crystallographer will often be served just as well by establishing the plane in an almost arbitrary fashion as by resorting to the most elaborate, nit-picking and pretentious least-squares treatment. The approximate plane and the associated perpendicular distances of the atoms from it will be all he needs as scaffolding for his geometrical and structural imagination; reasonable common sense will take the place of explicit attention to error estimates.

Nevertheless, we think it appropriate to lay out in some detail the derivation of the `best' plane, in a least-squares sense, through a group of atoms and of the standard uncertainties associated with this plane. We see two cases: (1) The weights of the atoms in question are considered to be isotropic and uncorrelated (i.e. the weight matrix for the positions of all the atoms is diagonal, when written in terms of Cartesian axes, and for each atom the three diagonal elements are equal). In such cases the weights may have little or nothing to do with estimates of random error in the atom positions (they may have been assigned merely for convenience or convention), and, therefore, no one should feel that the treatment is proper in respect to the theory of errors. Nevertheless, it may be desired to incorporate the error estimates (variances) of the atom positions into the results of such calculations, whereupon these variances (which may be anisotropic, with correlation between atoms) need to be propagated. In this case the distinction between weights (or their inverses) and variances must be kept very clear. (2) The weights are anisotropic and are presumably derived from a variance–covariance matrix, which may include correlation terms between different atoms; the objective is to achieve a truly proper Gaussian least-squares result.

3.2.2. Least-squares plane based on uncorrelated, isotropic weights

| top | pdf |

This is surely the most common situation; it is not often that one will wish to take the trouble, or be presumptive enough, to assign anisotropic or correlated weights to the various atoms. And one will sometimes, perhaps even often, not be genuinely interested in the hypothesis that the atoms actually are rigorously coplanar; for instance, one might be interested in examining the best plane through such a patently non-planar molecule as cyclohexane. Moreover, the calculation is simple enough, given the availability of computers and programs, as to be a practical realization of the off-the-cuff treatment suggested in our opening paragraph. The problem of deriving the plane's coefficients is intrinsically nonlinear in the way first discussed by Schomaker et al. (1959; SWMB ). Any formulation other than as an eigenvalue–eigenvector problem (SWMB ), as far as we can tell, will sometimes go astray. As to the propagation of errors, numerous treatments have been given, but none that we have seen is altogether satisfactory.

We refer all vectors and matrices to Cartesian axes, because that is the most convenient in calculation. However, a more elegant formulation can be written in terms of general axes [e.g., as in Shmueli (1981 )].

The notation is troublesome. Indices are needed for atom number and Cartesian direction, and the exponent 2 is needed as well, which is difficult if there are superscript indices. The best way seems to be to write all the indices as subscripts and distinguish among them by context – i, j, 1, 2, 3 for directions; k, l, p (and sometimes K, …) for atoms. In any case, atom first then direction if there are two subscripts; direction, if only one index for a vector component, but atom (in this section at least) if for a weight or a vector. And $[\sigma_{d_{1}}]$ , e.g., for the standard uncertainty of the distance of atom 1 from a plane. For simplicity in practice, we use Cartesian coordinates throughout.

The first task is to find the plane, which we write as $[0 = {\bf m}\cdot {\bf r} - d\equiv {\sf m}^{T}{\sf r} - d,]$ where r is here the vector from the origin to any point on the plane (but usually represents the measured position of an atom), m is a unit vector parallel to the normal from the origin to the plane, d is the length of the normal, and $[{\sf m}]$ and $[{\sf r}]$ are the column representations of m and r. The least-squares condition is to find the stationary values of $[S\equiv [w_{k}({\sf m}^{T}{\sf r}_{k} - d)^{2}]]$ subject to $[{\sf m}^{T}{\sf m} = 1]$ , with $[{\sf r}_{k}]$ , $[k = 1,\ldots, n]$ , the vector from the origin to atom k and with weights, $[w_{k}]$ , isotropic and without interatomic correlations for the n atoms of the plane. We also write S as $[S\equiv [w({\sf m}^{T}{\sf r} - d)^{2}]]$ , the subscript for atom number being implicit in the Gaussian summations $[([\ldots])]$ over all atoms, as it is also in the angle-bracket notation for the weighted average over all atoms, for example in $[\langle{\sf r} \rangle]$ – the weighted centroid of the groups of atoms – just below.

First solve for d, the origin-to-plane distance. $[\eqalign{ 0 &= - {1\over 2} {\partial S\over \partial d} = [w({\sf m}^{T}{\sf r} - d)] = 0,\cr d &= [w{\sf m}^{T}{\sf r}]/[w]\equiv {\sf m}^{T}\langle {\sf r} \rangle.}]$ Then $[\sspecialfonts\eqalign{ S&\equiv [w({\sf m}^{T}{\sf r} - d)^{2}] = [w\{{\sf m}^{T}({\sf r} - \langle {\sf r} \rangle)\}^{2}]\cr &\equiv [w({\sf m}^{T}{\bsf s})^{2}]\equiv {\sf m}^{T}[w\hbox{ss}^{T}]{\sf m}\equiv {\sf m}^{T}{\bsf A}{\sf m}.}]$ Here $[\sspecialfonts{\bsf s}_{k}\equiv {\sf r}_{k} - \langle {\sf r} \rangle]$ is the vector from the centroid to atom k. Then solve for m. This is the eigenvalue problem – to diagonalize $[\sspecialfonts{\bsf A}]$ (bear in mind that $[\sspecialfonts{\bsf A}_{ij}]$ is just $[[ws_{i}s_{j}]]$ ) by rotating the coordinate axes, i.e., to find the $[3 \times 3]$ arrays $[\sspecialfonts{\bsf M}]$ and $[\sspecialfonts{\bsf L}]$ , $[\sspecialfonts{\bsf L}]$ diagonal, to satisfy $[\sspecialfonts{\bsf M}^{T}{\bsf AM} = {\bsf L},\qquad {\bsf M}^{T}{\bsf M} = {\bsf I}.]$ $[\sspecialfonts{\bsf A}]$ and $[\sspecialfonts{\bsf M}]$ are symmetric; the columns $[\sspecialfonts\sf m]$ of $[\sspecialfonts{\bsf M}]$ are the direction cosines of, and the diagonal elements of $[\sspecialfonts{\bsf L}]$ are the sums of weighted squares of residuals from, the best, worst and intermediate planes, as discussed by SWMB .

3.2.2.1. Error propagation

| top | pdf |

Waser et al. (1973; WMC ) carefully discussed how the random errors of measurement of the atom positions propagate into the derived quantities in the foregoing determination of a least-squares plane. This section presents an extension of their discussion. To begin, however, we first show how standard first-order perturbation theory conveniently describes the propagation of error into $[\sspecialfonts{\bsf M}]$ and $[\sspecialfonts{\bsf L}]$ when the positions $[{\sf r}_{k}]$ of the atoms are incremented by the amounts $[\delta{\sf r}_{k} \equiv \xi_{k}]$ and the corresponding quantities $[\sspecialfonts{\bsf s}_{k} \equiv {\sf r}_{k} - \langle {\sf r}\rangle]$ (the vectors from the centroid to the atoms) by the amounts $[\sspecialfonts\eta_{k}, ({\bsf s} \rightarrow {\bsf s} + \eta),]$ $[\eta_{k}\equiv\xi_{k} - \langle \xi \rangle]$ . (The need to account for the variation in position of the centroid, i.e. to distinguish between $[\eta_{k}]$ and $[\xi_{k}]$ , was overlooked by WMC .) The consequent increments in $[\sspecialfonts{\bsf A}, {\bsf M}]$ and $[\sspecialfonts{\bsf L}]$ are $[\sspecialfonts\eqalign{ \delta {\bsf A} &= [w \eta {\bsf s}^{T}] + [w {\bsf s} \eta^{T}] \equiv \alpha,\cr \delta {\bsf M} &= {\bsf M} \mu,\cr \delta {\bsf L} &\equiv \lambda.}]$ Here the columns of $[\mu]$ are expressed as linear combinations of the columns of $[\sspecialfonts{\bsf M}]$ . Note also that both perturbations, $[\mu]$ and $[\lambda]$ , which are the adjustments to the orientations and associated eigenvalues of the principal planes, will depend on the reduced coordinates $[\sspecialfonts{\bsf s}]$ and the perturbing influences $[\xi]$ by way of $[\alpha]$ , which in turn depends only on the reduced coordinates and the reduced shifts $[\eta_{k}]$ . In contrast, $[\delta {\sf d} = \delta ({\sf m}^{T} \langle {\sf r} \rangle) = (\delta {\sf m}^{T}) \langle {\sf r} \rangle + {\sf m}^{T} \langle \xi \rangle\hbox{;}]$ the change in the origin-to-plane distance for the plane defined by the column vectors m of $[\sspecialfonts{\bsf M}]$ , depends on the $[\langle {\sf r} \rangle]$ and $[\langle \xi \rangle]$ directly as well as on the $[\sspecialfonts{\bsf s}]$ and $[\eta]$ by way of the $[\delta {\bf m}.]$

The first-order results arising from the standard conditions, $[\sspecialfonts{\bsf M}^{T}{\bsf M} = {\bsf I},{\bsf L}]$ diagonal, and $[\sspecialfonts{\bsf M}^{T}{\bsf AM} = {\bsf L}]$ , are $[\mu^{T} + \mu = 0, \lambda \hbox{ diagonal},]$ and $[\sspecialfonts\mu^{T} {\bsf M}^{T} {\bsf AM} + {\bsf M}^{T} \alpha {\bsf M} + {\bsf M}^{T} {\bsf AM}\mu = \mu^{T} {\bsf L} + {\bsf L}\mu + {\bsf M}^{T} \alpha {\bsf M} = \lambda.]$ Stated in terms of the matrix components $[\lambda_{ij}]$ and $[\mu_{ij}]$ , the first condition is $[\mu_{ij} = - \mu_{ji}]$ , hence $[\mu_{ii} = 0,\ i,j = 1, 2, 3]$ , and the second is $[\lambda_{ij} = 0,\ i \neq j]$ . With these results the third condition then reads $[\sspecialfonts\let\normalbaselines\relax\openup3pt\matrix{ \lambda_{jj} = ({\bsf M}^{T} \alpha {\bsf M})_{jj},\hfill &j = 1, 2, 3\hfill\cr \mu_{ij} = ({\bsf M}^{T} \alpha {\bsf M})_{ij}/(L_{jj} - L_{ii}),\hfill & i \neq j,\quad i,j = 1, 2, 3.\hfill}]$ All this is analogous to the usual first-order perturbation theory, as, for example, in elementary quantum mechanics.

Now rotate to the coordinates defined by WMC , with axes parallel to the original eigenvectors $[\sspecialfonts[{\bsf M} = {\bsf I},\ A_{ij} = L_{ij}\delta_{ij},]$ $[\sspecialfonts({\bsf M}^{T} \alpha {\bsf M})_{ij}= \alpha_{ij}]]$ , restrict attention to the best plane $[(M_{13} \equiv m_{1} = 0, M_{23} \equiv m_{2} = 0, M_{33} \equiv m_{3} = 1)]$ , and define $[\varepsilon^{T}]$ as $[(\delta m_{1}, \delta m_{2}, \delta d_{c})]$ , keeping in mind $[\delta m_{3} = \mu_{33} = 0]$ ; $[d_{c}]$ itself, the original plane-to-centroid distance, of course vanishes. One then finds $[\let\normalbaselines\relax\openup3pt\matrix{ \delta m_{i} \equiv \varepsilon_{i} = \alpha_{i3}/(L_{33} - L_{ii})\hfill&\cr \phantom{\delta m_{i} \equiv \varepsilon_{i}} = [w(s_{i}\eta_{3} + s_{3} \eta_{i})]/([ws_{3}^{2}] - [ws_{i}^{2}]),\hfill & i = 1, 2,\hfill\cr \delta d_{c} \equiv \varepsilon_{3} = [w \xi_{3}]/[w] \equiv \langle \xi_{3} \rangle,\hfill &\xi_{k} \equiv \delta r_{k},\hfill}]$ and also $[\delta d = \varepsilon_{1} \langle r_{1} \rangle + \varepsilon_{2} \langle r_{2} \rangle + \varepsilon_{3}.]$ These results have simple interpretations. The changes in direction of the plane normal (the $[\delta m_{i}]$ ) are rotations, described by $[\varepsilon_{1}]$ and $[\varepsilon_{2}]$ , in response to changes in moments acting against effective torsion force constants. For $[\varepsilon_{2}]$ , for example, the contribution of atom k to the total relevant moment, about direction 1, is $[-w_{k}s_{k3}s_{k2}]$ ( $[w_{k}s_{k3}]$ the `force' and $[s_{k2}]$ the lever arm), and its nominally first-order change has two parts, $[-w_{k}s_{k2}\eta_{3}]$ from the change in force and $[-w_{k}s_{k3}\eta_{2}]$ from the change in lever arm; the resisting torsion constant is $[[ws_{2}^{2}] - [ws_{3}^{2}]]$ , which, reflection will show, is qualitatively reasonable if not quantitatively obvious. The perpendicular displacement of the plane from the original centroid $[\langle r \rangle]$ is $[\varepsilon_{3}]$ , but there are two further contributions to $[\delta d]$ , the change in distance from origin to plane along the plane normal, that arise from the two components of out-of-plane rotation of the plane about its centroid. Note that $[\varepsilon_{3}]$ is not given by $[[w\eta_{3}]/[w] = [w(\xi_{3} - \langle \xi_{3} \rangle)]/[w]]$ , which vanishes identically.

There is a further, somewhat delicate point: If the group of atoms is indeed essentially coplanar, the $[s_{k3}]$ are of the same order of magnitude as the $[\eta_{ki}]$ , unlike the $[s_{ki}]$ , $[i \neq 3]$ , which are in general about as big as the lateral extent of the group. It is then appropriate to drop all terms in $[\eta_{i}]$ or $[\xi_{i},\ i \neq 3]$ , and, in the denominators, the terms in $[s_{k3}^{2}]$ .

The covariances of the derived quantities (by covariances we mean here both variances and covariances) can now be written out rather compactly by extending the implicit designation of atom numbers to double sums, the first of each of two similar factors referring to the first atom index and the second to the second, e.g., $[{\textstyle\sum_{kl}} ww(s_{i}s_{j})\ldots \equiv {\textstyle\sum_{kl}} w_{k}w_{l}(s_{ki}s_{ij})\ldots]$ . Note that the various covariances, i.e. the averages over the presumed population of random errors of replicated measurements, are indicated by overlines, angle brackets having been pre-empted for averages over sets of atoms. $[\displaylines{\hbox{cov}(m_{i}, m_{j}) \equiv \overline{\varepsilon_{i} \varepsilon_{j}}\hfill \cr\quad= {{\textstyle\sum_{kl}} ww(s_{i}s_{j} \overline{\eta_{3} \eta_{3}} + s_{3}s_{3} \overline{\eta_{i} \eta_{j}} + s_{i}s_{3} \overline{\eta_{3} \eta_{j}} + s_{3}s_{j} \overline{\eta_{i} \eta_{3}})\over \{[w(s_{3}^{2} - s_{i}^{2})]\} \{[w(s_{3}^{2} - s_{j}^{2})]\}}, {i, j = 1, 2} \cr \hbox{cov} (m_{i}, d_{c}) \equiv \overline{\varepsilon_{i} \varepsilon_{3}} = {{\textstyle\sum_{kl}} ww(s_{ki} \overline{\eta_{3} \xi_{3}} + s_{k3} \overline{\eta_{i} \xi_{3}})\over \{[w(s_{3}^{2} - s_{i}^{2})]\} [w]},\quad i,j = 1, 2\cr \sigma^{2} (d_{c}) \equiv \overline{\varepsilon_{3}^{2}} = {{\textstyle\sum_{kl}} ww \overline{\xi_{3} \xi_{3}}\over [w]^{2}}\cr \eqalign{\sigma^{2} (d) \equiv \langle (\delta d)^{2}\rangle &= \langle r_{1}\rangle^{2} \overline{\varepsilon_{1}^{2}} + \langle r_{2}\rangle^{2} \overline{\varepsilon_{2}^{2}} + \overline{\varepsilon_{3}^{2}} + 2\langle r_{1} \rangle \langle r_{2} \rangle \overline{\varepsilon_{1} \varepsilon_{2}} \cr &\quad+ 2 \langle r_{1} \rangle \overline{\varepsilon_{1} \varepsilon_{3}} + 2 \langle r_{2} \rangle \overline{\varepsilon_{2} \varepsilon_{3}}.}}]$

Interatomic covariance (e.g., $[\overline{\eta_{k3} \eta_{l3}},\ k \neq l]$ ) thus presents no formal difficulty, although actual computation may be tedious. Nonzero covariance for the $[\eta]$ 's may arise explicitly from interatomic covariance (e.g., $[\overline{\xi_{ki} \xi_{lj}},\ k \neq l]$ ) of the errors in the atomic positions $[{\sf r}_{k}]$ , and it will always arise implicitly because $[\langle\xi\rangle]$ in $[\eta_{k} = \xi_{k} - \langle\xi\rangle]$ includes all the $[\xi_{k}]$ and therefore has nonzero covariance with all of them and with itself, even if there is no interatomic covariance among the $[\xi_{i}]$ 's.

If both types of interatomic covariance (explicit and implicit) are negligible, the $[\varepsilon]$ covariances simplify a great deal, the double summations reducing to single summations. [The formal expression for $[\sigma^{2} (d)]$ does not change, so it will not be repeated.] $[\eqalignno{\hbox{cov} (m_{i}, m_{j}) &\equiv \overline{\varepsilon_{i} \varepsilon_{j}}\cr&= {[w^{2} (s_{i}s_{j} \overline{\eta_{3}^{2}} + s_{3}^{2} \overline{\eta_{i} \eta_{j}} + s_{i}s_{3} \overline{\eta_{3} \eta_{j}} + s_{3} s_{j} \overline{\eta_{i} \eta_{3}}]\over \{[w (s_{3}^{2} - s_{i}^{2})]\} \{[w (s_{3}^{2} - s_{j}^{2})]\}}, i, j = 1, 2\cr \hbox{cov} (m_{i}, d_{c}) & \equiv \overline{\varepsilon_{i} \varepsilon_{3}} = {[w^{2} (s_{i} \overline{\eta_{3} \xi_{3}} + s_{3} \overline{\eta_{i} \xi_{3}})]\over \{[w (s_{3}^{2} - s_{i}^{2})]\} [w]},\quad i, j = 1, 2\cr \sigma^{2} (d_{c}) &\equiv \overline{\varepsilon_{3}^{2}} = {[w^{2} \overline{\xi_{3}^{2}}]\over [w]^{2}}.}]$

When the variances are the same for $[\xi]$ as for $[\eta]$ (i.e. $[\overline{\xi_{i} \xi_{j}} = \overline{\eta_{i} \eta_{j}}]$ , all i, j) and the covariances all vanish $[(\overline{\xi_{i} \xi_{j}} = 0,\ i \neq j)]$ , the $[\overline{\varepsilon_{i} \varepsilon_{j}}]$ simplify further. If the variances are also isotropic $[(\overline{\xi_{i}^{2}} = \overline{\xi_{j}^{2}} = \sigma^{2}]$ , all i, j), there is still further simplification to $[\eqalign{ \sigma^{2} (m_{i}) &\equiv \overline{\varepsilon_{i}^{2}} = {[w^{2} \sigma^{2} (s_{i}^{2} + s_{3}^{2})]\over \{[w (s_{3}^{2} - s_{i}^{2})]\}^{2}},\quad i = 1, 2\cr \sigma^{2} (d_{c}) &\equiv \overline{\varepsilon_{3}^{2}} = [w^{2} \sigma^{2}]/[w]^{2}\cr \hbox{cov} (m_{1}, m_{2}) &\equiv \overline{\varepsilon_{1} \varepsilon_{2}} = {[w^{2} \sigma^{2} s_{1} s_{2}]\over \{[w (s_{3}^{2} - s_{1}^{2})]\}\{[w (s_{3}^{2} - s_{2}^{2})]\}}\cr \hbox{cov} (m_{i}, d_{c}) &\equiv \overline{\varepsilon_{i} \varepsilon_{3}} = {[w^{2} \sigma^{2} s_{i}]\over \{[w (s_{3}^{2} - s_{i}^{2})]\}[w]},\quad i = 1, 2.}]$ If allowance is made for the difference in definition between $[\varepsilon_{3}]$ and $[\delta d]$ , these expressions are equivalent to the ones (equations 7–9) given by WMC , who, however, do not appear to have been aware of the distinction between $[\eta]$ and $[\xi]$ and the possible consequences thereof.

If, finally, $[w^{-1}]$ for each atom is taken equal to its $[\overline{\eta_{j}^{2}} = \sigma^{2}]$ , all j, there is still further simplification. $[\eqalign{ \sigma^{2} (m_{i}) &\equiv \overline{\varepsilon_{i}^{2}} = {[w (s_{3}^{2} + s_{i}^{2})]\over \{[w (s_{3}^{2} - s_{i}^{2})]\}^{2}},\quad i = 1, 2\cr \sigma^{2} (d_{c}) &\equiv \overline{\varepsilon_{3}^{2}} = [w]/[w]^{2} = 1/[w]\cr \hbox{cov} (m_{1}, m_{2}) &\equiv \overline{\varepsilon_{1} \varepsilon_{2}} = {[ws_{1} s_{2}]\over \{[w (s_{3}^{2} - s_{1}^{2})]\} \{[w (s_{3}^{2} - s_{2}^{2})]\}}\cr \hbox{cov} (m_{i}, d_{c}) &\equiv \overline{\varepsilon_{i} \varepsilon_{3}} = {[w s_{i}]\over \{[w (s_{3}^{2} - s_{i}^{2})]\}[w]},\quad i = 1, 2.}]$

For the earlier, more general expressions for the components of $[\overline{\varepsilon \varepsilon^{T}}]$ it is still necessary to find $[\overline{\eta_{ki} \eta_{lj}}]$ and $[\overline{\eta_{ki} \xi_{l3}}]$ in terms of $[\overline{\xi_{ki} \xi_{lj}}]$ , with $[\eta_{ki} \equiv \delta s_{ki} = \delta (r_{ki} - \langle r_{ki}\rangle ) = \xi_{ki} - \langle\xi_{i}\rangle = {\textstyle\sum_{l}} w_{l} (\xi_{ki} - \xi_{li})/[w]]$ . $[\eqalign{ \overline{\eta_{ki} \eta_{pj}} &= {\textstyle\sum\limits_{l, q}} w_{l} w_{q} \overline{(\xi_{ki} - \xi_{li}) (\xi_{pj} - \xi_{qj})}/[w]^{2}\cr &= \overline{(\xi_{ki} - \langle\xi_{i}\rangle) (\xi_{pj} - \langle\xi_{j}\rangle)}\cr \overline{\eta_{ki} \xi_{p3}} &= {\textstyle\sum\limits_{l}} w_{l} \overline{(\xi_{ki} - \xi_{li}) \xi_{p3}}/[w] = \overline{\xi_{ki} \xi_{p3}} - \overline{\langle\xi_{i}\rangle \xi_{p3}}.}]$

In the isotropic, `no-correlation' case, for example, these reduce to $[\eqalign{ \overline{\eta_{ki} \eta_{pi}} &= - w_{k} \overline{\xi_{ki}^{2}}/[w] - w_{p} \overline{\xi_{pi}^{2}}/[w] \cr &\quad + [w^{2} \overline{\xi_{i}^{2}}]/[w]^{2},\quad k \neq p; i = 1, 2\cr \overline{\eta_{ki}^{2}} &= (1 - 2w_{k}/[w]) \overline{\xi_{ki}^{2}} + [w^{2} \overline{\xi_{i}^{2}}]/[w]^{2},\quad i = 1, 2\cr \overline{\eta_{k3} \eta_{p3}} &= - w_{p} \overline{\xi_{p3}^{2}}/[w],}]$ and $[\overline{\eta_{k3}^{2}} = \overline{\xi_{k3}^{2}} - w_{k} \overline{\xi_{k3}^{2}}/[w] = \overline{\xi_{k3}^{2}} (1 - w_{k}/[w]).]$ Here the difference between the correct covariance values and the values obtained on ignoring the variation in $[\langle r\rangle]$ may be important if the number of defining atoms is small, say, 5 or 4 or, in the extreme, 3.

3.2.2.2. The standard uncertainty of the distance from an atom to the plane

| top | pdf |

There are two cases, as has been pointed out, e.g., by Ito (1982 ).

(1) The atom (atom K) was not included in the specification of the plane. $[\eqalign{ d_{K} &= {\sf m}^{T} ({\sf r}_{K} - \langle {\sf r}\rangle) = r_{k3} - \langle r_{3}\rangle\cr \delta d_{K} &= \xi_{K3} + s_{K1} \varepsilon_{1} + s_{K2} \varepsilon_{2} - \varepsilon_{3}\cr \sigma^{2}_{d_{K}} &= \overline{\xi_{K3}^{2}} + s_{K1}^{2} \overline{\varepsilon_{1}^{2}} + s_{K2}^{2} \overline{\varepsilon_{2}^{2}} + \overline{\varepsilon_{3}^{2}}\cr &\quad + 2 s_{K1} s_{K2} \overline{\varepsilon_{1} \varepsilon_{2}} - 2 s_{K1} \overline{\varepsilon_{1} \varepsilon_{3}} - 2 s_{K2} \overline{\varepsilon_{2} \varepsilon_{3}}\cr &\quad + 2 s_{K1} \overline{\xi_{K3} \varepsilon_{1}} + 2 s_{K2} \overline{\xi_{K3} \varepsilon_{2}} - 2 \overline{\xi_{K3} \varepsilon_{3}}.}]$ In the isotropic, `no-correlation' case the last three terms, i.e. the terms in $[\overline{\varepsilon_{i} \xi_{K3}}]$ , are all negligible or zero.

In either case the value for $[\overline{\xi_{K3}^{2}}]$ and the appropriate $[\overline{\varepsilon_{i} \varepsilon_{j}}]$ values from the least-squares-plane calculation need to be inserted.
(2) Atom K was included in the specification of the plane. The expression for $[\sigma_{d_{K}}^{2}]$ remains the same, but the averages in it may be importantly different.

For example, consider a plane defined by only three atoms, one of overwhelmingly great w at (0, 0, 0), one at (1, 0, 0) and one at (0, 1, 0). The centroid is at (0, 0, 0) and we take [K = 2] , i.e. $[\sigma_{d_{2}}]$ is the item of interest. Of course, it is obvious without calculation that the standard uncertainties vanish for the distances of the three atoms from the plane they alone define; the purpose here is only to show, in one case for one of the atoms, that the calculation gives the same result, partly, it will be seen, because the change in orientation of the plane is taken into account. If the only variation in the atom positions is described by $[\overline{\xi_{23}^{2}} = \sigma^{2}]$ , one has $[{s}_{21} = 1, \varepsilon_{3} = \varepsilon_{2} = 0,]$ $[\varepsilon_{1} = - \xi_{23}]$ , and $[\overline{\xi_{K3} \varepsilon_{1}} = \sigma^{2}]$ . The non-vanishing terms in the desired variance are then $[\eqalign{ \sigma_{d_{2}}^{2} &= \overline{\xi_{23}^{2}} + s_{21}^{2} \overline{\varepsilon_{1}^{2}} + 2 s_{21} \overline{\xi_{23} \varepsilon_{1}}\cr &= (1 + 1 - 2) \sigma^{2} = 0.}]$ If, however, the problem concerns the same plane and a fourth atom at position $[(1, 0, r_{43})]$ , not included in the specification of the plane and uncertain only in respect to $[r_{43}]$ (which is arbitrary) with $[\overline{\xi_{43}^{2}} = \sigma^{2}]$ (the same mean-square variation in direction 3 as for atom 2) and $[\overline{\xi_{43} \xi_{23}} = 0]$ , the calculation for $[\sigma_{d_{4}}^{2}]$ runs the same as before, except for the third term: $[\sigma_{d_{4}}^{2} = (1 + 1 - 0) \sigma^{2} = 2\sigma^{2}.]$

Extreme examples of this kind show clearly enough that variation in the direction of the plane normal or in the normal component of the centroid position will sometimes be important, the remarks to the contrary by Shmueli (1981 ) and, for the centroid, the omission by WMC notwithstanding. If only a few atoms are used to define the plane (e.g., three or, as is often the case, a very few more), both the covariance with the centroid position and uncertainty in the direction of the normal are likely to be important. The uncertainty in the normal may still be important, even if a goodly number of atoms are used to define the plane, whenever the test atom lies near or beyond the edge of the lateral domain defined by the other atoms.

3.2.3. The proper least-squares plane , with Gaussian weights

| top | pdf |

If it is desired to weight the points to be fitted by a plane in the sense of Gaussian least squares, then two items different from what we have seen in the crystallographic literature have to be brought into view: (1) the weights may be anisotropic and include interatomic correlations, because the error matrix of the atom coordinates may in general be anisotropic and include interatomic correlations; and (2) it has to be considered that the atoms truly lie on a plane and that their observed positions are to be adjusted to lie precisely on that plane, whatever its precise position may turn out to be and no matter what the direction, in response to the anisotropic weighting, of their approach to the plane.

An important consequence of (1), the non-diagonal character of the weight matrix, even with Cartesian coordinates, is that the problem is no longer an ordinary eigenvalue problem as treated by SWMB (1959 ),¹ not even if there is no interatomic correlation and the anisotropy is the same for each atom. On this last case the contrary remark in SWMB at the beginning of the footnote, p. 601, is incorrect, and the treatments in terms of the eigenvector–eigenvalue problem by Hamilton (1961 , 1964 , pp. 174–177) and Ito (1981a )² evade us. At best the problem is still not a genuine eigenvalue problem if the anisotropies of the atoms are not all alike.

Hypothesis (2), of perfect planarity, may be hard to swallow. It has to be tested, and for any set of atoms the conclusion may be that they probably do not lie on a plane. But if the hypothesis is provisionally adopted (and it has to be decided beforehand which of the following alternatives is to be followed), the adjusted positions are obtained by moving the atoms onto the plane

(a) along paths normal to the plane, or
(b) along the proper paths of `least resistance' – that is, paths with, in general, both normal and lateral components differently directed for each atom so as to minimize the appropriately weighted quadratic form of differences between the observed and adjusted coordinates. The lateral motions (and the anisotropic weights that induce them) may change the relative weights of different atoms in accordance with the orientation of the plane; change the perpendicular distance of origin-to-plane; and change the orientation of the plane in ways that may at first give surprise.

Our first example of this has already been given.³ A second, which actually inspired our first, is to be found in Hamilton (1964 ; example 5-8-1, p. 177), who discusses a rectangle of four points with identical error ellipsoids elongated in the long direction of the rectangle. The unweighted best line bisects the pattern along this direction, but the weighted best line is parallel to the short direction, if the elongation of the ellipsoids is sufficient. A third example (it is severely specialized so that a precise result can be attained without calculation) has three atoms ABC arranged like a $[C_{2\nu}\!-\!mm]$ molecule with bond angle $[90^{\circ}]$ . The central atom, B, has overwhelming isotropic weight; A and C have parallel extremely elongated error ellipsoids, aligned parallel to the A—B bond. The unweighted best line passes through B parallel to $[A\cdots C]$ ; the weighted best line passes through B and through C. Our last example is of a plane defined by a number of atoms of which one lies a considerable distance above the plane and at a distance from the normal through the centroid, but with the downward semi-axis of its extremely elongated prolate error ellipsoid intersecting that normal before it intersects the plane. If this atom is moved at right angles to the plane and further away from it, the centroid normal tips toward the atom, whereas it would tip away if the atom's weight function were isotropic or if the calculation were the usual one and in effect constrained the adjusted position of the atom to move at right angles to the plane.

The lead notion here – that the observed points are to be adjusted individually to fit a curve (surface) of required type exactly, rather than that the curve should simply be constructed to fit the observed points as well as possible in the sense of minimizing the weighted sum of squares of the distances along some preordained direction (perhaps normal to the plane, but perhaps as in ordinary curve fitting parallel to the y axis) – we first learned from the book by Deming (1943 ), Statistical Adjustment of Data, but it is to be found in Whittaker & Robinson (1929 ), Arley & Buch (1950 ), Hamilton (1964 , our most used reference), and doubtless widely throughout the least-squares literature. It has recently been strongly emphasized by Lybanon (1984 ), who gives a number of modern references. It is the only prescription that properly satisfies the least-squares conditions, whereas (a) and other analogous prescriptions are only arbitrary regressions, in (a) a normal regression onto the plane.⁴

We have explored the problem of least-squares adjustment of observed positions subject to anisotropic weights with the help of three Fortran programs, one for the straight line and two for the plane. In the first plane program an approximate plane is derived, coordinates are rotated as in WMC (1973 ), and the parameters of the plane are adjusted and the atoms moved onto it, either normally or in full accord with the least-squares condition, but in either case subject to independent anisotropic weight matrices. The other plane program, described in Appendix 3.2.1 , proceeds somewhat more directly, with the help of the method of Lagrange multipliers. However, neither program has been brought to a satisfactory state for the calculation of the variances and covariances of the derived quantities.

3.2.3.1. Formulation and solution of the general Gaussian plane

| top | pdf |

We conclude with an outline for a complete feasible solution, including interatomic weight-matrix elements. Consider atoms at observed vector positions $[{\bf r}_{k}]$ , $[k = 1,\ldots ,n]$ , designated in the following equations by $[\sspecialfonts{\bsf R}]$ , an n-by-3 array, with $[{\bi R}_{ki} = {\bi r}_{ki}]$ ; the corresponding adjusted positions denoted by the array $[\sspecialfonts{\bsf R}_{a}]$ ; n constraints (each adjusted position $[{\bf r}_{a} - a]$ for `adjusted' – must be on the plane), and [3n + 3] adjustable parameters ( $[\sspecialfonts 3n\; {\bsf R}_{a}]$ components and the 3 components of the vector b of reciprocal intercepts of the plane), so that the problem has [n - 3] degrees of freedom. The 3n-by-3n weight matrix P may be anisotropic for the separate atoms, and may include interatomic elements; it is symmetric in the sense $[{\bi P}_{kilj} = {\bi P}_{ljki}]$ , but $[{\bi P}_{kilj}]$ will not in general be equal to $[{\bi P}_{kjli}]$ . The array $[\boldLambda]$ denotes n Lagrange multipliers , one for each atom and unrelated to the $[\lambda]$ 's of Section 3.2.2 ; m and d still represent the direction cosines of the plane normal and the perpendicular origin-to-plane distance.

For a linear least-squares problem we know (see, e.g., Hamilton, 1964 , p. 143) that the proper weight matrix is the reciprocal of the atomic error matrix $[({\bi P} = {\bi M}^{-1})]$ ;⁵ note that ` M' is unrelated to the ` $[\sspecialfonts{\bsf M}]$ ' of Section 3.2.2. The least-squares sum is now $[\sspecialfonts S = ({\bsf R}-{\bsf R}_{a}){\bi P}({\bsf R}-{\bsf R}_{a}),]$ and the augmented sum for applying the method of Lagrange multipliers is $[\Phi = S/2 - \boldLambda {\sf b} {\bf R}_{a}.]$ Here the summation brackets $[([\ldots])]$ have been replaced by matrix products, but the simplified notation is not matrix notation. It has to be regarded as only a useful mnemonic in which all the indices and their clutter have been suppressed in view of their inconveniently large number for some of the arrays. If the dimensions of each are kept in mind, it is easy to recall the indices, and if the indices are resupplied at any point, it is not difficult to discover what is really meant by any of the expressions or whether evaluations have to be executed in a particular order.

The conditions to be satisfied are $[\sspecialfonts\eqalign{ 0 &= {-\partial \Phi\over \partial{\bsf R}_{a}} = {\bi P}({\bsf R}-{\bsf R}_{a}) + \boldLambda {\sf b},\cr 0 &= {-\partial \Phi\over \partial {\sf b}} = \boldLambda {\bsf R}_{a},\cr {\bf 1} &= {\sf b}{\bsf R}_{a}.}]$ That the partial derivatives of [S/2] should be represented by $[\sspecialfonts{\bi P}({\bsf R} - {\bsf R}_{a})]$ depends upon the above-mentioned symmetry of P. Note that each of the n unit elements of 1 expresses the condition that its $[{\bf r}_{a}]$ should indeed lie on the plane, and that $[\boldLambda{\sf b}]$ is just the same as $[{\sf b}\boldLambda]$ . The perpendicular origin-to-plane distance and the direction cosines of the plane normal are $[d^{2} = 1/{\sf b}^{T}{\sf b}]$ and $[{\sf m} = {\sf b}/\sqrt{{\sf b}^{T}{\sf b}}]$ .

On multiplication by M the first condition solves for $[\sspecialfonts{\bsf R}_{a}]$ , and that expression combined separately with the second condition and with the third gives highly nonlinear equations (there is further mention of this nonlinearity in Appendix 3.2.1 ) that have to be solved for $[{\sf b}]$ and $[\boldLambda]$ : $[\sspecialfonts\eqalign{ {\bsf R}_{a} &= {\bsf R} + {\bi M} \boldLambda {\sf b} = {\bsf R} + {\bi M}{\sf b} \boldLambda\cr 0 &= {\bsf F} \equiv (\boldLambda{\bi M}\boldLambda){\sf b} + \boldLambda{\bsf R}\cr 0 &= {\bsf G} \equiv ({\sf b}{\bi M}{\sf b})\boldLambda - ({\bf 1}-{\sf b}{\bsf R}).}]$

The best way of handling these equations, at least if it is desired to find their solutions both for $[\sspecialfonts{\bsf R}_{a}]$ , $[\boldLambda]$ , and $[\sf b]$ and for the error matrix of these derived quantities in terms of the error matrix for $[\sspecialfonts{\bsf R}]$ , seems to be to find an approximate solution by one means or another, to make sure that it is the desired solution, if (as may be) there is ambiguity, and to shift the origin to a point far enough from the plane and close enough to the centroid normal to avoid the difficulties discussed by SWMB . Then linearize the first and second equations by differentiation and solve the results first iteratively to fit the current residuals $[\sspecialfonts{\bsf F}_{\rm 0}]$ and $[\sspecialfonts{\bsf G}_{\rm 0}]$ and then for nominally infinitesimal increments $[\delta]$ $[\sf b]$ and $[\delta \boldLambda]$ . In effect, one deals with equations $[\sspecialfonts{\bsf QX} = {\bsf Y}]$ , where $[\sspecialfonts{\bsf Q}]$ is the $[(n + 3) \times (n + 3)]$ matrix of coefficients of the following set of equations, $[\sspecialfonts{\bsf X}]$ is the [(n + 3)] -dimensional vector of increments $[\delta]$ $[\sf b]$ and $[\delta \boldLambda]$ , and $[\sspecialfonts{\bsf Y}]$ is the vector either of the first terms or of the second terms on the right-hand side of the equations. $[\sspecialfonts\eqalign{ (\boldLambda{\bi M}\boldLambda)\delta {\sf b} + ({\bi M}\boldLambda {\sf b} + \boldLambda{\bi M}{\sf b} + {\bsf R})\delta \boldLambda &= -{\bsf F}_{0} - \boldLambda \delta{\bsf R}\cr ({\bi M}{\sf b}\boldLambda + {\sf b}{\bi M}\boldLambda + {\bsf R})\delta {\sf b} + ({\sf b}{\bi M}{\sf b})\delta \boldLambda &= -{\bsf G}_{0} - {\sf b}\delta{\bsf R}.}]$

When $[\sspecialfonts{\bsf X}]$ becomes the vector $[\varepsilon]$ of errors in $[\sf b]$ and $[\boldLambda]$ as induced by errors $[\sspecialfonts\eta \equiv \delta {\bsf R}]$ in the measured atom positions, these equations become, in proper matrix notation, $[\sspecialfonts{\bsf Q} \varepsilon = {\bsf Z} \eta]$ , with solution $[\sspecialfonts\varepsilon = {\bsf Q}^{-1} {\bsf Z} \eta]$ , where $[\sspecialfonts{\bsf Z}]$ is the [(n + 3)] -dimensional vector of components, first of $[\sf b]$ then of $[\boldLambda]$ . The covariance matrix $[\overline{\varepsilon \varepsilon^{T}}]$ , from which all the covariances of $[\sf b]$ , $[\sspecialfonts{\bsf R}_{a}]$ , and $[\sspecialfonts{\bsf R}]$ (including for the latter any atoms that were not included for the plane) can be derived, is then given by $[\sspecialfonts\overline{\varepsilon \varepsilon^{T}} = {\bsf Q}^{-1} {\bsf Z} \overline{\eta \eta^{T}} {\bsf Z}^{T} ({\bsf Q}^{-1})^{T}.]$ This is not as simple as the familiar expression for propagation of errors for a least-squares problem solved without the use of Lagrange multipliers, i.e. $[\sspecialfonts\overline{\varepsilon \varepsilon^{T}} = {\bsf B}^{\rm -1}]$ , where $[\sspecialfonts{\bsf B}]$ is the matrix of the usual normal equations, both because $[\sspecialfonts{\bsf B} = {\bsf B}^{T}]$ is no longer valid and because the middle factor $[\sspecialfonts{\bsf Z} \overline{\eta \eta^{T}} {\bsf Z}^{T}]$ is no longer equal to $[\sspecialfonts{\bsf B}^{\rm -1}]$ .

It is easy to verify that $[\boldLambda]$ consists of a set of coefficients for combining the n row vectors of M $[\sf b]$ , in the expression for $[\sspecialfonts{\bsf R}_{a}]$ , into corrections to $[\sspecialfonts{\bsf R}]$ such that each adjusted position lies exactly on the plane: $[\sspecialfonts\eqalign{ {\sf b}{\bsf R}_{a} &= {\sf b}{\bsf R} + {\sf b}{\bi M}{\sf b} ({\sf b}{\bi M}{\sf b})^{-1} ({\bi 1} - {\sf b}{\bsf R})\cr &= {\sf b}{\bsf R} + {\bi 1} - {\sf b}{\bsf R} = {\bi 1}.}]$ At the same time one can see how the lateral shifts occur in response to the anisotropy of M, especially if, now, only the anisotropic case without interatomic correlations is considered. For atom k write $[\sf b]$ in terms of its components along the principal axes of $[\sspecialfonts{\bsf M}_{k}]$ , associated with the eigenvalues $[\alpha, \beta]$ and $[\gamma]$ ; the shifts are then proportional to $[\sspecialfonts{\bsf M}_{\alpha\alpha} b_{\alpha}^{2},\ {\bsf M}_{\beta\beta}b_{\beta}^{2}]$ and $[\sspecialfonts{\bsf M}_{\gamma\gamma}b_{\gamma}^{2}]$ , each along its principal axis, and the sums of the contributions of these shift components to the total displacement along the plane normal or along either of two orthogonal directions in the plane can readily be visualized. In effect $[\sspecialfonts{\bsf M}_{k}]$ is the compliance matrix for these shifts of atom k. Similarly, it can be seen that in the isotropic case with interatomic correlations a pair of equally weighted atoms located, for example, at some distance apart and at about the same distance from the plane, will have different shifts (and different influences on d and $[\sf m]$ ) depending on whether the covariance between the errors in the perpendicular components of their observed positions relative to the plane is small, or, if large, whether it is positive or is negative. If the covariance is large and positive, the adjusted positions will both be pulled toward the plane, strongly resisting, however, the apparent necessity that both reach the plane by moving by different amounts; in consequence, there will be a strong tendency for the plane to tilt toward the more distant atom, and possibly even away from the nearer one. But if the covariance is large and negative, the situation is reversed: the more distant atom can readily move more than the nearer one, while it is very difficult to move them together; the upshot is then that the plane will move to meet the original midpoint of the two atoms while they tilt about that midpoint to accommodate the plane.

It is attractive to solve our problem with the `normal' formulation of the plane, $[{\sf mr} = d]$ , and so directly avoid the problems that arise for $[d\approx 0]$ . The solution in terms of the reciprocal intercepts b has been given first, however, because reducing by two (d and a Lagrange multiplier) the number of parameters to be solved for may more than make up for the nuisance of having to move the origin. The solution in terms of d follows.

The augmented variation function is now $[\sspecialfonts\Phi = ({\bsf R} - {\bsf R}_{a}) {\bi P} ({\bsf R} - {\bsf R}_{a})/{\rm 2} - \boldLambda ({\sf m}{\bsf R}_{a} - d{\bi 1}) + \mu {\sf mm}/{\rm 2},]$ the term in the new Lagrange multiplier, $[\mu]$ , and the term in $[d{\bi 1}]$ having been added to the previous expression. The 1, an n-vector of 1's, is needed to express the presence of n terms in the $[\boldLambda]$ sum. There are then five equations to be satisfied – actually [n + 1 + 3n + 3 + 1 = 4n + 5] ordinary equations – in the $[\sspecialfonts 3n\; {\bsf R}_{a}]$ components, the n $[\boldLambda]$ 's, the 3 $[{\sf m}]$ components, $[\mu]$ , and [d - 4n + 5] unknowns in all, as required. These equations are as follows: $[\sspecialfonts\eqalign{ {\sf m}{\bf R}_{a} &= d{\bi 1}\cr {\sf mm} &= 1\cr 0 &= -\left({\partial \Phi\over \partial {\bsf R}_{a}}\right) = {\bi P}({\bsf R} - {\bsf R}_{a}) + \boldLambda{\sf m}\cr 0 &= {\partial \Phi\over \partial {\sf m}} = -\boldLambda{\bsf R}_{a} + \mu {\sf m}\cr 0 &= {\partial \Phi\over \partial d} = \boldLambda{\bi 1}.}]$ As before, multiply the third equation by M and solve for $[\sspecialfonts{\bsf R}_{a}]$ . Then substitute the result into the first and fourth equations to obtain $[\sspecialfonts\eqalign{ {\sf m}{\bsf R} + {\sf m}{\bi M}{\sf m}\boldLambda &= d{\bi 1},\cr {\sf mm} &= 1,\cr \boldLambda{\bsf R} + \boldLambda{\bi M}\boldLambda{\sf m} &= \mu {\sf m},\cr \boldLambda{\bi 1} &= 0}]$ as the [n + 5] mostly nonlinear equations to be solved for $[\sf m]$ , $[\boldLambda]$ , d and $[\mu]$ by linearizing (differentiation), solving for increments, and iterating, in the pattern described more fully above. An approximate solution for $[\sf m]$ and d has first to be obtained somehow, perhaps by the method of SWMB (with isotropic uncorrelated weights), checked for suitability, and extended to a full complement of first approximations by $[\sspecialfonts\eqalign{ \boldLambda &= ({\sf m}{\bi M}{\sf m})^{-1} (d{\bi 1} - {\sf m}{\bsf R})\cr \mu &= {\sf m}\boldLambda{\bsf R} + {\sf m}\boldLambda{\bi M}\boldLambda{\sf m},}]$ which readily follow from the previous equations. As in the `intercepts' solution the linearized expression for the increments in $[\mu, \boldLambda, d]$ and $[{\sf m}]$ can be used together with the equation for $[\sspecialfonts{\bsf R}_{a}]$ to obtain all the covariances needed in the treatment described in Section 3.2.2.

3.2.3.2. Concluding remarks

| top | pdf |

Proper tests of statistical significance of this or that aspect of a least-squares plane can be made if the plane has been based on a proper weight matrix as discussed in Section 3.2.3 ; if it can be agreed that the random errors of observation are normally distributed; and if an agreeable test (null) hypothesis can be formulated. For example, one may ask for the probability that a degree of fit of the observed positions to the least-squares plane at least as poor as the fit that was found might occur if the atoms in truth lie precisely on a plane. The $[\chi^{2}]$ test answers this question: a table of probabilities displayed as a function of $[\chi^{2}]$ and $[\nu]$ provides the answer. Here $[\chi^{2}]$ is just our minimized $[S = {\sf b}\boldLambda{\bi MPM}\!\boldLambda{\sf b} = {\sf b}\boldLambda{\bi M}\!\boldLambda {\sf b},]$ and $[\eqalign{ \nu &= n_{\rm observations} - n_{\rm adjusted\; parameters} - n_{\rm constraints}\cr &= 3n - (n + 3) - n = n - 3,}]$ is the number of degrees of freedom for the problem of the plane (erroneously cited in at least one widely used crystallographic system of programs as [3n - 3] ). There will not usually be any reason to believe that the atoms are exactly coplanar in any case; nevertheless, this test may well give a satisfying indication of whether or not the atoms are, in the investigator's judgment, essentially coplanar. It must be emphasized that $[\chi^{2}]$ as calculated in Section 3.2.3 will include proper allowance for uncertainties in the d and orientation of the plane with greater reliability than the estimates of Section 3.2.2 , which are based on nominally arbitrary weights. Both, however, will allow for the large variations in d and tilt that can arise in either case if n is small. Some of the earlier, less complete discussions of this problem have been mentioned in Section 3.2.2.

Among the problems not considered here are ones of fitting more than one plane to a set of observed positions, e.g. of two planes fitted to three sets of atoms associated, respectively, with the first plane, the second plane, and both planes, and of the angle between the two planes. For the atoms common to both planes there will be a fundamental point of difference between existing programs (in which, in effect, the positions of the atoms in common are inconsistently adjusted to one position on the first plane and, in general, a different position on the second) and what we would advocate as the proper procedure of requiring the adjusted positions of such atoms to lie on the line of intersection of the two planes. As to the dihedral angle there is a difficulty, noted by WMC (1973 , p. 2705), that the usual formulation of $[\sigma^{2} (\theta_{0})]$ in terms of the cosine of the dihedral angle reduces to 0/0 at $[\theta_{0} = 0]$ . However, this variance is obviously well defined if the plane normals and their covariances are well defined. The essential difficulty lies with the ambiguity in the direction of the line of intersection of the planes in the limit of zero dihedral angle. For the torsion angle about a line defined by two atoms, there should be no such difficulty. It seems likely that for the two-plane problem proposed above, the issue that decides whether the dihedral angle will behave like the standard dihedral angle or, instead, like the torsion angle, will be found to be whether or not two or more atoms are common to both planes.

All that we have tried to bring out about the covariances of derived quantities involving the plane requires that the covariances of the experimental atom positions (reduced in our formulations to Cartesian coordinates) be included. However, such covariances of derived quantities are often not available in practice, and are usually left unused even if they are. The need to use the covariances, not just the variances, has been obvious from the beginning. It has been emphasized in another context by Schomaker & Marsh (1983 ) and much more strongly and generally by Waser (1973 ), whose pleading seems to have been generally ignored, by now, for about thirty years.

Appendix A3.2.1

Consider n atoms at observed vector positions $[\sf r]$ (expressed in Cartesians), n constraints (each adjusted position $[{\sf r}_{a}-a]$ for `adjusted' – must be on the plane) and [3n + 3] adjustable parameters ( $[3n\;{\sf r}_{a}]$ components and the 3 components of the vector $[\sf a]$ of reciprocal intercepts of the plane), so that the problem has [n - 3] degrees of freedom. The weight matrices $[\sspecialfonts{\bsf P}]$ may be differently anisotropic for each atom, but there are no interatomic correlations. As before, square brackets, ` $[[\ldots]]$ ', represent the Gaussian sum over all atoms, usually suppressing the atom indices. We also write $[\lambda]$ , not the $[\lambda]$ of Section 3.2.2 , for the Lagrange multipliers (one for each atom); $[\sf m]$ for the direction cosines of the plane normal; and d for the perpendicular origin-to-plane distance.

As before, $[\sspecialfonts{\bsf P}_{k}]$ is the reciprocal of the atomic error matrix : $[\sspecialfonts{\bsf P}_{k} = {\bsf M}_{k}^{-1}]$ (correspondingly, $[{\bi P} \equiv {\bi M}^{-1})]$ but ` $[\sspecialfonts{\bsf M}]$ ' is no longer the ` $[\sspecialfonts{\bsf M}]$ ' of Section 3.2.2. The appropriate least-squares sum is $[\sspecialfonts S = [({\sf r} - \hbox{\sf r}_{a})^{T} {\bsf P}({\sf r} - {\sf r}_{a})]]$ and the augmented sum for applying the method of Lagrange multipliers is $[\Phi = S/2 - [\lambda {\sf a}^{T} {\sf r}_{a}].]$ $[\Phi]$ is to be minimized with respect to variations of the adjusted atom positions $[{\sf r}_{ka}]$ and plane reciprocal intercepts $[b_{i}]$ , leading to the equations $[\sspecialfonts\eqalign{ 0 &= {-\partial \phi\over \partial {\sf r}_{a}^{T}} = {\bsf P}({\sf r} - {\sf r}_{a}) + \lambda {\sf b} \hbox{ and} \cr 0 &= {-\partial \Phi\over \partial {\sf b}^{T}} = [\lambda {\sf r}_{a}],}]$ subject to the plane conditions $[{\sf b}^{T}{\sf r}_{a} = 1]$ , each atom, with $[d^{2} = 1/({\sf b}^{T}{\sf b})]$ , $[{\sf m} = {\sf b}/\sqrt{{\sf b}^{T}{\sf b}}]$ . These equations are nonlinear.

A convenient solution runs as follows: first multiply the first equation by $[\sspecialfonts{\bsf M}]$ and solve for the adjusted atom positions in terms of the Lagrange multipliers $[\lambda]$ and the reciprocal intercepts $[\sf b]$ of the plane; then multiply that result by $[{\sf b}^{T}]$ applying the plane conditions, and solve for the $[\lambda]$ 's $[\sspecialfonts\eqalign{ {\sf r}_{a} &= {\sf r} + \lambda {\bsf M}{\sf b},\quad {\bsf M} \equiv {\bsf P}^{-1}\cr 1 &= {\sf b}^{T} {\sf r} + \lambda {\sf b}^{T} {\bsf M}{\sf b},\cr \lambda &= (1 - {\sf b}^{T}{\sf r})/({\sf b}^{T} {\bsf M}{\sf b}).}]$ Next insert these values for the $[\lambda]$ 's and $[{\sf r}_{a}]$ 's into the second equation: $[\sspecialfonts 0 = {\partial \Phi\over \partial {\sf b}^{T}} = [\lambda {\sf r}_{a}] = \left[{1 - {\sf b}^{T}{\sf r}\over {\sf b}^{T} {\bsf M}{\sf b}} \left({\sf r} + {1 - {\sf b}^{T}{\sf r}\over {\sf b}^{T}{\bsf M}{\sf b}} {\bsf M}{\sf b}\right)\right].]$

This last equation, $[F({\sf b}) = 0]$ , is to be solved for $[{\sf b}]$ . It is highly nonlinear: $[F({\sf b}) = O({\sf b}^{3}) / O({\sf b}^{4})]$ . One can proceed to a first approximation by writing $[0 = [(1 - {\sf b}^{T}{\sf r}) {\sf r}]]$ ; i.e., $[\sspecialfonts[{\bsf rr}]\cdot {\sf b} = [{\bsf r}]]$ , in dyadic notation. $[\sspecialfonts[{\bsf M} = {\bsf I}]$ , all atoms; $[1 - {\sf b}^{T} {\sf r} = 0]$ in the multiplier of $[\sspecialfonts{\bsf M}{\sf b}/({\sf b}^{T}{\bsf M}{\sf b}).]]$ A linear equation in $[\sf b]$ , this approximation usually works well.⁶ We have also used the iterative Frazer, Duncan & Collar eigenvalue solution as described by SWMB (1959 ), which works even when the plane passes exactly through the origin. To continue the solution of the nonlinear equations, by linearizing and iterating, write $[F({\sf b}) = 0]$ in the form $[\sspecialfonts\eqalign{ 0 &= [\lambda {\sf r} + \lambda^{2} {\bsf M}{\sf b}] \cr &\approx \left[{\sf r}{\partial \lambda\over \partial {\sf b}} + 2{\bsf M}{\sf b} \lambda {\partial \lambda\over \partial {\sf b}} + \lambda^{2}{\bsf M}\right] \delta {\sf b} + [\lambda ({\sf r} + \lambda {\bsf M}{\sf b})]_{0},}]$ solve for $[\delta{\sf b}]$ , reset $[\sf b]$ to $[{\sf b} + \delta {\sf b}]$ , etc., until the desired degree of convergence of $[|\delta {\sf b}|/|{\sf b}|]$ toward zero has been attained. By $[\partial \lambda / \partial {\sf b}]$ is meant the partial derivative of the above expression for $[\lambda]$ with respect to $[\sf b]$ , as detailed in the next paragraph.

In the Fortran program DDLELSP (double precision Deming Lagrange, with error estimates, least-squares plane, written to explore this solution) the preceding equation is recast as $[\sspecialfonts\eqalign{ {\bsf B} \delta {\sf b} &\equiv \left[\lambda^{2} {\bsf M} + ({\sf r} + 2\lambda {\bsf M}{\sf b}) {\partial \lambda\over \partial {\sf b}}\right] \delta {\sf b}\cr &\equiv {\bsf Y} = - [\lambda ({\sf r} + \lambda {\bsf M}{\sf b})]_{0},}]$ with $[\sspecialfonts\eqalign{ {\partial \lambda\over \partial {\sf b}} &= {\sf r}^{T} / {\sf b}^{T} {\bsf M}{\sf b} - [2(1 - {\sf b}^{T} {\sf r}) / ({\sf b}^{T} {\bsf M}{\sf b})^{2}]\cr &= -({\sf r}^{T} + 2\lambda {\sf b}^{T} {\bsf M}) / {\sf b}^{T} {\bsf M}{\sf b}.}]$

The usual goodness of fit, GOF2 in DDLELSP, evaluates to $[\sspecialfonts\eqalign{ \hbox{GOF2} &= \left({S_{\min}\over n - 3}\right)^{1/2} = \left({1\over n - 3} [\lambda^{2}{\sf b}^{T} {\bsf MPM}{\sf b}]\right)^{1/2}\cr &= \left({1\over n - 3} [\lambda^{2}{\sf b}^{T}{\bsf M}{\sf b}]\right)^{1/2}\cr &= \left({1\over n - 3} \left[{(1 - {\sf b}^{T} {\sf r})^{2}\over {\sf b}^{T}{\bsf M}{\sf b}}\right]\right)^{1/2}.}]$ This is only an approximation, because the residuals $[1-{\sf b}^{T}{\sf r}]$ are not the differences between the observations and appropriate linear functions of the parameters, nor are their variances (the $[\sspecialfonts{\sf b}^{T} {\bsf M}{\sf b}]$ 's) independent of the parameters (or, in turn, the errors in the observations).

We ask also about the perpendicular distances, e, of atoms to plane and the mean-square deviation $[\overline{(\delta e)^{2}}]$ to be expected in e. $[\eqalign{ e &= (1 - {\sf b}^{T} {\sf r})/\sqrt{{\sf b}^{T} {\sf b}} = d(1 - {\sf b}^{T} {\sf r})\cr \delta e &= -d ({\sf b}^{T} \eta + {\sf r}^{T} \varepsilon) + O (\eta^{2}).}]$ Here $[\eta]$ and $[\varepsilon]$ are the errors in $[\sf r]$ and $[\sf b]$ . Neglecting ` $[O(\eta^{2})]$ ' then leads to $[\overline{(\delta e)^{2}} = d^{2} ({\sf b}^{T} \overline{\eta \eta^{T}} {\sf b} + 2{\sf b}^{T} \overline{\eta \varepsilon^{T}} {\sf r} + {\sf r}^{T} \overline{\varepsilon \varepsilon^{T}} {\sf r}).]$ We have $[\sspecialfonts\overline{\eta \eta^{T}} = {\bsf M} = {\bsf P}^{-1}]$ , but $[\overline{\eta \varepsilon^{T}}]$ and $[\overline{\varepsilon \varepsilon^{T}}]$ perhaps still need to be evaluated.

References

Arley, N. & Buch, K. R. (1950). Introduction to the theory of probability and statistics. New York: John Wiley. London: Chapman & Hall.Google Scholar

Clews, C. J. B. & Cochran, W. (1949). The structures of pyrimidines and purines. III. An X-ray investigation of hydrogen bonding in aminopyrimidines. Acta Cryst. 2, 46–57.Google Scholar

Deming, W. E. (1943). Statistical adjustment of data. New York: John Wiley. [First edition 1938.]Google Scholar

Hamilton, W. C. (1961). On the least-squares plane through a set of points. Acta Cryst. 14, 185–189.Google Scholar

Hamilton, W. C. (1964). Statistics in physical science. New York: Ronald Press.Google Scholar

Ito, T. (1981a). Least-squares refinement of the best-plane parameters. Acta Cryst. A37, 621–624.Google Scholar

Ito, T. (1981b). On the least-squares plane through a group of atoms. Sci. Pap. Inst. Phys. Chem. Res. Saitama, 75, 55–58.Google Scholar

Ito, T. (1982). On the estimated standard deviation of the atom-to-plane distance. Acta Cryst. A38, 869–870.Google Scholar

Kalantar, A. H. (1987). Slopes of straight lines when neither axis is error-free. J. Chem. Educ. 64, 28–29.Google Scholar

Lybanon, M. (1984). A better least-squares method when both variables have uncertainties. Am. J. Phys. 52, 22–26.Google Scholar

Robertson, J. M. (1948). Bond-length variations in aromatic systems. Acta Cryst. 1, 101–109.Google Scholar

Schomaker, V. & Marsh, R. E. (1983). On evaluating the standard deviation of U_eq. Acta Cryst. A39, 819–820.Google Scholar

Schomaker, V., Waser, J., Marsh, R. E. & Bergman, G. (1959). To fit a plane or a line to a set of points by least squares. Acta Cryst. 12, 600–604.Google Scholar

Shmueli, U. (1981). On the statistics of atomic deviations from the `best' molecular plane. Acta Cryst. A37, 249–251.Google Scholar

Waser, J. (1973). Dyadics and variances and covariances of molecular parameters, including those of best planes. Acta Cryst. A29, 621–631.Google Scholar

Waser, J., Marsh, R. E. & Cordes, A. W. (1973). Variances and covariances for best-plane parameters including dihedral angles. Acta Cryst. B29, 2703–2708.Google Scholar

Whittaker, E. T. & Robinson, G. (1929). The calculus of observations. London: Blackie.Google Scholar

International Tables for Crystallography (2006). Vol. B. ch. 3.2, pp. 353-359
https://doi.org/10.1107/97809553602060000560

Chapter 3.2. The least-squares plane

3.2.1. Introduction

3.2.2. Least-squares plane based on uncorrelated, isotropic weights

3.2.2.1. Error propagation

3.2.2.2. The standard uncertainty of the distance from an atom to the plane

3.2.3. The proper least-squares plane, with Gaussian weights

3.2.3.1. Formulation and solution of the general Gaussian plane

3.2.3.2. Concluding remarks

Appendix A3.2.1

References

3.2.3. The proper least-squares plane , with Gaussian weights