The proper least-squares plane, with Gaussian weights

Marsh, R. E.; Schomaker, V.

doi:10.1107/97809553602060000560

International
Tables for
Crystallography
Volume B
Reciprocal space
Edited by U. Shmueli

pdf | chapter contents | chapter index | related articles

International Tables for Crystallography (2006). Vol. B. ch. 3.2, pp. 355-358 | 1 | 2 |

Section 3.2.3. The proper least-squares plane , with Gaussian weights

R. E. Marsh^a ^* and V. Schomaker^b ^‡

^aThe Beckman Institute–139–74, California Institute of Technology, 1201 East California Blvd, Pasadena, California 91125, USA, and ^bDepartment of Chemistry, University of Washington, Seattle, Washington 98195, USA
Correspondence e-mail: rem@xray.caltech.edu

3.2.3. The proper least-squares plane , with Gaussian weights

| top | pdf |

If it is desired to weight the points to be fitted by a plane in the sense of Gaussian least squares, then two items different from what we have seen in the crystallographic literature have to be brought into view: (1) the weights may be anisotropic and include interatomic correlations, because the error matrix of the atom coordinates may in general be anisotropic and include interatomic correlations; and (2) it has to be considered that the atoms truly lie on a plane and that their observed positions are to be adjusted to lie precisely on that plane, whatever its precise position may turn out to be and no matter what the direction, in response to the anisotropic weighting, of their approach to the plane.

An important consequence of (1), the non-diagonal character of the weight matrix, even with Cartesian coordinates, is that the problem is no longer an ordinary eigenvalue problem as treated by SWMB (1959),¹ not even if there is no interatomic correlation and the anisotropy is the same for each atom. On this last case the contrary remark in SWMB at the beginning of the footnote, p. 601, is incorrect, and the treatments in terms of the eigenvector–eigenvalue problem by Hamilton (1961, 1964, pp. 174–177) and Ito (1981a)² evade us. At best the problem is still not a genuine eigenvalue problem if the anisotropies of the atoms are not all alike.

Hypothesis (2), of perfect planarity, may be hard to swallow. It has to be tested, and for any set of atoms the conclusion may be that they probably do not lie on a plane. But if the hypothesis is provisionally adopted (and it has to be decided beforehand which of the following alternatives is to be followed), the adjusted positions are obtained by moving the atoms onto the plane

(a) along paths normal to the plane, or
(b) along the proper paths of `least resistance' – that is, paths with, in general, both normal and lateral components differently directed for each atom so as to minimize the appropriately weighted quadratic form of differences between the observed and adjusted coordinates. The lateral motions (and the anisotropic weights that induce them) may change the relative weights of different atoms in accordance with the orientation of the plane; change the perpendicular distance of origin-to-plane; and change the orientation of the plane in ways that may at first give surprise.

Our first example of this has already been given.³ A second, which actually inspired our first, is to be found in Hamilton (1964; example 5-8-1, p. 177), who discusses a rectangle of four points with identical error ellipsoids elongated in the long direction of the rectangle. The unweighted best line bisects the pattern along this direction, but the weighted best line is parallel to the short direction, if the elongation of the ellipsoids is sufficient. A third example (it is severely specialized so that a precise result can be attained without calculation) has three atoms ABC arranged like a $[C_{2\nu}\!-\!mm]$ molecule with bond angle $[90^{\circ}]$ . The central atom, B, has overwhelming isotropic weight; A and C have parallel extremely elongated error ellipsoids, aligned parallel to the A—B bond. The unweighted best line passes through B parallel to $[A\cdots C]$ ; the weighted best line passes through B and through C. Our last example is of a plane defined by a number of atoms of which one lies a considerable distance above the plane and at a distance from the normal through the centroid, but with the downward semi-axis of its extremely elongated prolate error ellipsoid intersecting that normal before it intersects the plane. If this atom is moved at right angles to the plane and further away from it, the centroid normal tips toward the atom, whereas it would tip away if the atom's weight function were isotropic or if the calculation were the usual one and in effect constrained the adjusted position of the atom to move at right angles to the plane.

The lead notion here – that the observed points are to be adjusted individually to fit a curve (surface) of required type exactly, rather than that the curve should simply be constructed to fit the observed points as well as possible in the sense of minimizing the weighted sum of squares of the distances along some preordained direction (perhaps normal to the plane, but perhaps as in ordinary curve fitting parallel to the y axis) – we first learned from the book by Deming (1943), Statistical Adjustment of Data, but it is to be found in Whittaker & Robinson (1929), Arley & Buch (1950), Hamilton (1964, our most used reference), and doubtless widely throughout the least-squares literature. It has recently been strongly emphasized by Lybanon (1984), who gives a number of modern references. It is the only prescription that properly satisfies the least-squares conditions, whereas (a) and other analogous prescriptions are only arbitrary regressions, in (a) a normal regression onto the plane.⁴

We have explored the problem of least-squares adjustment of observed positions subject to anisotropic weights with the help of three Fortran programs, one for the straight line and two for the plane. In the first plane program an approximate plane is derived, coordinates are rotated as in WMC (1973), and the parameters of the plane are adjusted and the atoms moved onto it, either normally or in full accord with the least-squares condition, but in either case subject to independent anisotropic weight matrices. The other plane program, described in Appendix 3.2.1, proceeds somewhat more directly, with the help of the method of Lagrange multipliers. However, neither program has been brought to a satisfactory state for the calculation of the variances and covariances of the derived quantities.

3.2.3.1. Formulation and solution of the general Gaussian plane

| top | pdf |

We conclude with an outline for a complete feasible solution, including interatomic weight-matrix elements. Consider atoms at observed vector positions $[{\bf r}_{k}]$ , $[k = 1,\ldots ,n]$ , designated in the following equations by $[\sspecialfonts{\bsf R}]$ , an n-by-3 array, with $[{\bi R}_{ki} = {\bi r}_{ki}]$ ; the corresponding adjusted positions denoted by the array $[\sspecialfonts{\bsf R}_{a}]$ ; n constraints (each adjusted position $[{\bf r}_{a} - a]$ for `adjusted' – must be on the plane), and [3n + 3] adjustable parameters ( $[\sspecialfonts 3n\; {\bsf R}_{a}]$ components and the 3 components of the vector b of reciprocal intercepts of the plane), so that the problem has [n - 3] degrees of freedom. The 3n-by-3n weight matrix P may be anisotropic for the separate atoms, and may include interatomic elements; it is symmetric in the sense $[{\bi P}_{kilj} = {\bi P}_{ljki}]$ , but $[{\bi P}_{kilj}]$ will not in general be equal to $[{\bi P}_{kjli}]$ . The array $[\boldLambda]$ denotes n Lagrange multipliers , one for each atom and unrelated to the $[\lambda]$ 's of Section 3.2.2; m and d still represent the direction cosines of the plane normal and the perpendicular origin-to-plane distance.

For a linear least-squares problem we know (see, e.g., Hamilton, 1964, p. 143) that the proper weight matrix is the reciprocal of the atomic error matrix $[({\bi P} = {\bi M}^{-1})]$ ;⁵ note that ` M' is unrelated to the ` $[\sspecialfonts{\bsf M}]$ ' of Section 3.2.2. The least-squares sum is now $[\sspecialfonts S = ({\bsf R}-{\bsf R}_{a}){\bi P}({\bsf R}-{\bsf R}_{a}),]$ and the augmented sum for applying the method of Lagrange multipliers is $[\Phi = S/2 - \boldLambda {\sf b} {\bf R}_{a}.]$ Here the summation brackets $[([\ldots])]$ have been replaced by matrix products, but the simplified notation is not matrix notation. It has to be regarded as only a useful mnemonic in which all the indices and their clutter have been suppressed in view of their inconveniently large number for some of the arrays. If the dimensions of each are kept in mind, it is easy to recall the indices, and if the indices are resupplied at any point, it is not difficult to discover what is really meant by any of the expressions or whether evaluations have to be executed in a particular order.

The conditions to be satisfied are $[\sspecialfonts\eqalign{ 0 &= {-\partial \Phi\over \partial{\bsf R}_{a}} = {\bi P}({\bsf R}-{\bsf R}_{a}) + \boldLambda {\sf b},\cr 0 &= {-\partial \Phi\over \partial {\sf b}} = \boldLambda {\bsf R}_{a},\cr {\bf 1} &= {\sf b}{\bsf R}_{a}.}]$ That the partial derivatives of [S/2] should be represented by $[\sspecialfonts{\bi P}({\bsf R} - {\bsf R}_{a})]$ depends upon the above-mentioned symmetry of P. Note that each of the n unit elements of 1 expresses the condition that its $[{\bf r}_{a}]$ should indeed lie on the plane, and that $[\boldLambda{\sf b}]$ is just the same as $[{\sf b}\boldLambda]$ . The perpendicular origin-to-plane distance and the direction cosines of the plane normal are $[d^{2} = 1/{\sf b}^{T}{\sf b}]$ and $[{\sf m} = {\sf b}/\sqrt{{\sf b}^{T}{\sf b}}]$ .

On multiplication by M the first condition solves for $[\sspecialfonts{\bsf R}_{a}]$ , and that expression combined separately with the second condition and with the third gives highly nonlinear equations (there is further mention of this nonlinearity in Appendix 3.2.1) that have to be solved for $[{\sf b}]$ and $[\boldLambda]$ : $[\sspecialfonts\eqalign{ {\bsf R}_{a} &= {\bsf R} + {\bi M} \boldLambda {\sf b} = {\bsf R} + {\bi M}{\sf b} \boldLambda\cr 0 &= {\bsf F} \equiv (\boldLambda{\bi M}\boldLambda){\sf b} + \boldLambda{\bsf R}\cr 0 &= {\bsf G} \equiv ({\sf b}{\bi M}{\sf b})\boldLambda - ({\bf 1}-{\sf b}{\bsf R}).}]$

The best way of handling these equations, at least if it is desired to find their solutions both for $[\sspecialfonts{\bsf R}_{a}]$ , $[\boldLambda]$ , and $[\sf b]$ and for the error matrix of these derived quantities in terms of the error matrix for $[\sspecialfonts{\bsf R}]$ , seems to be to find an approximate solution by one means or another, to make sure that it is the desired solution, if (as may be) there is ambiguity, and to shift the origin to a point far enough from the plane and close enough to the centroid normal to avoid the difficulties discussed by SWMB. Then linearize the first and second equations by differentiation and solve the results first iteratively to fit the current residuals $[\sspecialfonts{\bsf F}_{\rm 0}]$ and $[\sspecialfonts{\bsf G}_{\rm 0}]$ and then for nominally infinitesimal increments $[\delta]$ $[\sf b]$ and $[\delta \boldLambda]$ . In effect, one deals with equations $[\sspecialfonts{\bsf QX} = {\bsf Y}]$ , where $[\sspecialfonts{\bsf Q}]$ is the $[(n + 3) \times (n + 3)]$ matrix of coefficients of the following set of equations, $[\sspecialfonts{\bsf X}]$ is the [(n + 3)] -dimensional vector of increments $[\delta]$ $[\sf b]$ and $[\delta \boldLambda]$ , and $[\sspecialfonts{\bsf Y}]$ is the vector either of the first terms or of the second terms on the right-hand side of the equations. $[\sspecialfonts\eqalign{ (\boldLambda{\bi M}\boldLambda)\delta {\sf b} + ({\bi M}\boldLambda {\sf b} + \boldLambda{\bi M}{\sf b} + {\bsf R})\delta \boldLambda &= -{\bsf F}_{0} - \boldLambda \delta{\bsf R}\cr ({\bi M}{\sf b}\boldLambda + {\sf b}{\bi M}\boldLambda + {\bsf R})\delta {\sf b} + ({\sf b}{\bi M}{\sf b})\delta \boldLambda &= -{\bsf G}_{0} - {\sf b}\delta{\bsf R}.}]$

When $[\sspecialfonts{\bsf X}]$ becomes the vector $[\varepsilon]$ of errors in $[\sf b]$ and $[\boldLambda]$ as induced by errors $[\sspecialfonts\eta \equiv \delta {\bsf R}]$ in the measured atom positions, these equations become, in proper matrix notation, $[\sspecialfonts{\bsf Q} \varepsilon = {\bsf Z} \eta]$ , with solution $[\sspecialfonts\varepsilon = {\bsf Q}^{-1} {\bsf Z} \eta]$ , where $[\sspecialfonts{\bsf Z}]$ is the [(n + 3)] -dimensional vector of components, first of $[\sf b]$ then of $[\boldLambda]$ . The covariance matrix $[\overline{\varepsilon \varepsilon^{T}}]$ , from which all the covariances of $[\sf b]$ , $[\sspecialfonts{\bsf R}_{a}]$ , and $[\sspecialfonts{\bsf R}]$ (including for the latter any atoms that were not included for the plane) can be derived, is then given by $[\sspecialfonts\overline{\varepsilon \varepsilon^{T}} = {\bsf Q}^{-1} {\bsf Z} \overline{\eta \eta^{T}} {\bsf Z}^{T} ({\bsf Q}^{-1})^{T}.]$ This is not as simple as the familiar expression for propagation of errors for a least-squares problem solved without the use of Lagrange multipliers, i.e. $[\sspecialfonts\overline{\varepsilon \varepsilon^{T}} = {\bsf B}^{\rm -1}]$ , where $[\sspecialfonts{\bsf B}]$ is the matrix of the usual normal equations, both because $[\sspecialfonts{\bsf B} = {\bsf B}^{T}]$ is no longer valid and because the middle factor $[\sspecialfonts{\bsf Z} \overline{\eta \eta^{T}} {\bsf Z}^{T}]$ is no longer equal to $[\sspecialfonts{\bsf B}^{\rm -1}]$ .

It is easy to verify that $[\boldLambda]$ consists of a set of coefficients for combining the n row vectors of M $[\sf b]$ , in the expression for $[\sspecialfonts{\bsf R}_{a}]$ , into corrections to $[\sspecialfonts{\bsf R}]$ such that each adjusted position lies exactly on the plane: $[\sspecialfonts\eqalign{ {\sf b}{\bsf R}_{a} &= {\sf b}{\bsf R} + {\sf b}{\bi M}{\sf b} ({\sf b}{\bi M}{\sf b})^{-1} ({\bi 1} - {\sf b}{\bsf R})\cr &= {\sf b}{\bsf R} + {\bi 1} - {\sf b}{\bsf R} = {\bi 1}.}]$ At the same time one can see how the lateral shifts occur in response to the anisotropy of M, especially if, now, only the anisotropic case without interatomic correlations is considered. For atom k write $[\sf b]$ in terms of its components along the principal axes of $[\sspecialfonts{\bsf M}_{k}]$ , associated with the eigenvalues $[\alpha, \beta]$ and $[\gamma]$ ; the shifts are then proportional to $[\sspecialfonts{\bsf M}_{\alpha\alpha} b_{\alpha}^{2},\ {\bsf M}_{\beta\beta}b_{\beta}^{2}]$ and $[\sspecialfonts{\bsf M}_{\gamma\gamma}b_{\gamma}^{2}]$ , each along its principal axis, and the sums of the contributions of these shift components to the total displacement along the plane normal or along either of two orthogonal directions in the plane can readily be visualized. In effect $[\sspecialfonts{\bsf M}_{k}]$ is the compliance matrix for these shifts of atom k. Similarly, it can be seen that in the isotropic case with interatomic correlations a pair of equally weighted atoms located, for example, at some distance apart and at about the same distance from the plane, will have different shifts (and different influences on d and $[\sf m]$ ) depending on whether the covariance between the errors in the perpendicular components of their observed positions relative to the plane is small, or, if large, whether it is positive or is negative. If the covariance is large and positive, the adjusted positions will both be pulled toward the plane, strongly resisting, however, the apparent necessity that both reach the plane by moving by different amounts; in consequence, there will be a strong tendency for the plane to tilt toward the more distant atom, and possibly even away from the nearer one. But if the covariance is large and negative, the situation is reversed: the more distant atom can readily move more than the nearer one, while it is very difficult to move them together; the upshot is then that the plane will move to meet the original midpoint of the two atoms while they tilt about that midpoint to accommodate the plane.

It is attractive to solve our problem with the `normal' formulation of the plane, $[{\sf mr} = d]$ , and so directly avoid the problems that arise for $[d\approx 0]$ . The solution in terms of the reciprocal intercepts b has been given first, however, because reducing by two (d and a Lagrange multiplier) the number of parameters to be solved for may more than make up for the nuisance of having to move the origin. The solution in terms of d follows.

The augmented variation function is now $[\sspecialfonts\Phi = ({\bsf R} - {\bsf R}_{a}) {\bi P} ({\bsf R} - {\bsf R}_{a})/{\rm 2} - \boldLambda ({\sf m}{\bsf R}_{a} - d{\bi 1}) + \mu {\sf mm}/{\rm 2},]$ the term in the new Lagrange multiplier, $[\mu]$ , and the term in $[d{\bi 1}]$ having been added to the previous expression. The 1, an n-vector of 1's, is needed to express the presence of n terms in the $[\boldLambda]$ sum. There are then five equations to be satisfied – actually [n + 1 + 3n + 3 + 1 = 4n + 5] ordinary equations – in the $[\sspecialfonts 3n\; {\bsf R}_{a}]$ components, the n $[\boldLambda]$ 's, the 3 $[{\sf m}]$ components, $[\mu]$ , and [d - 4n + 5] unknowns in all, as required. These equations are as follows: $[\sspecialfonts\eqalign{ {\sf m}{\bf R}_{a} &= d{\bi 1}\cr {\sf mm} &= 1\cr 0 &= -\left({\partial \Phi\over \partial {\bsf R}_{a}}\right) = {\bi P}({\bsf R} - {\bsf R}_{a}) + \boldLambda{\sf m}\cr 0 &= {\partial \Phi\over \partial {\sf m}} = -\boldLambda{\bsf R}_{a} + \mu {\sf m}\cr 0 &= {\partial \Phi\over \partial d} = \boldLambda{\bi 1}.}]$ As before, multiply the third equation by M and solve for $[\sspecialfonts{\bsf R}_{a}]$ . Then substitute the result into the first and fourth equations to obtain $[\sspecialfonts\eqalign{ {\sf m}{\bsf R} + {\sf m}{\bi M}{\sf m}\boldLambda &= d{\bi 1},\cr {\sf mm} &= 1,\cr \boldLambda{\bsf R} + \boldLambda{\bi M}\boldLambda{\sf m} &= \mu {\sf m},\cr \boldLambda{\bi 1} &= 0}]$ as the [n + 5] mostly nonlinear equations to be solved for $[\sf m]$ , $[\boldLambda]$ , d and $[\mu]$ by linearizing (differentiation), solving for increments, and iterating, in the pattern described more fully above. An approximate solution for $[\sf m]$ and d has first to be obtained somehow, perhaps by the method of SWMB (with isotropic uncorrelated weights), checked for suitability, and extended to a full complement of first approximations by $[\sspecialfonts\eqalign{ \boldLambda &= ({\sf m}{\bi M}{\sf m})^{-1} (d{\bi 1} - {\sf m}{\bsf R})\cr \mu &= {\sf m}\boldLambda{\bsf R} + {\sf m}\boldLambda{\bi M}\boldLambda{\sf m},}]$ which readily follow from the previous equations. As in the `intercepts' solution the linearized expression for the increments in $[\mu, \boldLambda, d]$ and $[{\sf m}]$ can be used together with the equation for $[\sspecialfonts{\bsf R}_{a}]$ to obtain all the covariances needed in the treatment described in Section 3.2.2.

3.2.3.2. Concluding remarks

| top | pdf |

Proper tests of statistical significance of this or that aspect of a least-squares plane can be made if the plane has been based on a proper weight matrix as discussed in Section 3.2.3; if it can be agreed that the random errors of observation are normally distributed; and if an agreeable test (null) hypothesis can be formulated. For example, one may ask for the probability that a degree of fit of the observed positions to the least-squares plane at least as poor as the fit that was found might occur if the atoms in truth lie precisely on a plane. The $[\chi^{2}]$ test answers this question: a table of probabilities displayed as a function of $[\chi^{2}]$ and $[\nu]$ provides the answer. Here $[\chi^{2}]$ is just our minimized $[S = {\sf b}\boldLambda{\bi MPM}\!\boldLambda{\sf b} = {\sf b}\boldLambda{\bi M}\!\boldLambda {\sf b},]$ and $[\eqalign{ \nu &= n_{\rm observations} - n_{\rm adjusted\; parameters} - n_{\rm constraints}\cr &= 3n - (n + 3) - n = n - 3,}]$ is the number of degrees of freedom for the problem of the plane (erroneously cited in at least one widely used crystallographic system of programs as [3n - 3] ). There will not usually be any reason to believe that the atoms are exactly coplanar in any case; nevertheless, this test may well give a satisfying indication of whether or not the atoms are, in the investigator's judgment, essentially coplanar. It must be emphasized that $[\chi^{2}]$ as calculated in Section 3.2.3 will include proper allowance for uncertainties in the d and orientation of the plane with greater reliability than the estimates of Section 3.2.2, which are based on nominally arbitrary weights. Both, however, will allow for the large variations in d and tilt that can arise in either case if n is small. Some of the earlier, less complete discussions of this problem have been mentioned in Section 3.2.2.

Among the problems not considered here are ones of fitting more than one plane to a set of observed positions, e.g. of two planes fitted to three sets of atoms associated, respectively, with the first plane, the second plane, and both planes, and of the angle between the two planes. For the atoms common to both planes there will be a fundamental point of difference between existing programs (in which, in effect, the positions of the atoms in common are inconsistently adjusted to one position on the first plane and, in general, a different position on the second) and what we would advocate as the proper procedure of requiring the adjusted positions of such atoms to lie on the line of intersection of the two planes. As to the dihedral angle there is a difficulty, noted by WMC (1973, p. 2705), that the usual formulation of $[\sigma^{2} (\theta_{0})]$ in terms of the cosine of the dihedral angle reduces to 0/0 at $[\theta_{0} = 0]$ . However, this variance is obviously well defined if the plane normals and their covariances are well defined. The essential difficulty lies with the ambiguity in the direction of the line of intersection of the planes in the limit of zero dihedral angle. For the torsion angle about a line defined by two atoms, there should be no such difficulty. It seems likely that for the two-plane problem proposed above, the issue that decides whether the dihedral angle will behave like the standard dihedral angle or, instead, like the torsion angle, will be found to be whether or not two or more atoms are common to both planes.

All that we have tried to bring out about the covariances of derived quantities involving the plane requires that the covariances of the experimental atom positions (reduced in our formulations to Cartesian coordinates) be included. However, such covariances of derived quantities are often not available in practice, and are usually left unused even if they are. The need to use the covariances, not just the variances, has been obvious from the beginning. It has been emphasized in another context by Schomaker & Marsh (1983) and much more strongly and generally by Waser (1973), whose pleading seems to have been generally ignored, by now, for about thirty years.

References

Arley, N. & Buch, K. R. (1950). Introduction to the theory of probability and statistics. New York: John Wiley. London: Chapman & Hall.Google Scholar

Deming, W. E. (1943). Statistical adjustment of data. New York: John Wiley. [First edition 1938.]Google Scholar

Hamilton, W. C. (1961). On the least-squares plane through a set of points. Acta Cryst. 14, 185–189.Google Scholar

Hamilton, W. C. (1964). Statistics in physical science. New York: Ronald Press.Google Scholar

Ito, T. (1981a). Least-squares refinement of the best-plane parameters. Acta Cryst. A37, 621–624.Google Scholar

Lybanon, M. (1984). A better least-squares method when both variables have uncertainties. Am. J. Phys. 52, 22–26.Google Scholar

Schomaker, V. & Marsh, R. E. (1983). On evaluating the standard deviation of U_eq. Acta Cryst. A39, 819–820.Google Scholar

Schomaker, V., Waser, J., Marsh, R. E. & Bergman, G. (1959). To fit a plane or a line to a set of points by least squares. Acta Cryst. 12, 600–604.Google Scholar

Waser, J. (1973). Dyadics and variances and covariances of molecular parameters, including those of best planes. Acta Cryst. A29, 621–631.Google Scholar

Waser, J., Marsh, R. E. & Cordes, A. W. (1973). Variances and covariances for best-plane parameters including dihedral angles. Acta Cryst. B29, 2703–2708.Google Scholar

Whittaker, E. T. & Robinson, G. (1929). The calculus of observations. London: Blackie.Google Scholar

International Tables for Crystallography (2006). Vol. B. ch. 3.2, pp. 355-358

Section 3.2.3. The proper least-squares plane, with Gaussian weights

3.2.3. The proper least-squares plane, with Gaussian weights

3.2.3.1. Formulation and solution of the general Gaussian plane

3.2.3.2. Concluding remarks

References

Section 3.2.3. The proper least-squares plane , with Gaussian weights

3.2.3. The proper least-squares plane , with Gaussian weights