International
Tables for Crystallography Volume C Mathematical, physical and chemical tables Edited by E. Prince © International Union of Crystallography 2006 |
International Tables for Crystallography (2006). Vol. C. ch. 8.3, pp. 694-701
https://doi.org/10.1107/97809553602060000611 Chapter 8.3. Constraints and restraints in refinementa NIST Center for Neutron Research, National Institute of Standards and Technology, Gaithersburg, MD 20899, USA,bGeophysical Laboratory, Carnegie Institution of Washington, 5251 Broad Branch Road NW, Washington, DC 20015-1305, USA, and cLaboratory for the Structure of Matter, Code 6030, Naval Research Laboratory, Washington, DC 20375-5000, USA In crystallographic refinement a great deal is known about a crystal before any diffraction data are collected. Chapter 8.3 discusses ways to make use of this prior knowledge by applying constraints and restraints to the model. The most obvious constraints, which may be applied either by using Lagrange undetermined multipliers or by reducing the number of variable parameters in the model, are those imposed by space-group symmetry on unit-cell constants, atom positions, and atom-displacement parameters. They may also include site occupancies, molecular shapes, noncrystallographic symmetry, and rigid-body thermal motion. Restraints may be applied when bond distances, bond angles, and relations among motion parameters are not known precisely in advance, but must lie within narrow ranges. The chapter includes a table of typical values of bond distances and angles found in the amino-acid residues in polypeptides and proteins. Keywords: constrained models; constraints in refinement; Lagrange undetermined multipliers; restraints in refinement; stereochemical constraints. |
In Chapter 8.1
, the method of least squares is discussed as a technique for fitting a theoretical model that contains adjustable parameters to a set of observations. The discussion is very general and contains very little mention of what sorts of quantities the observations are or what the model represents. In crystallography, the model is a crystal, which is constructed from identical unit cells that contain atoms, and which diffracts X-rays, neutrons or electrons in a manner that is characteristic of the arrangement of those atoms. The sample may be either a single crystal or a polycrystalline powder, and the observations are diffracted intensities, which may be fitted directly, as in the Rietveld method for powders (see Chapter 8.6
; also Rietveld, 1969
), or converted to derived quantities such as integrated intensities, squared moduli of structure amplitudes, or the structure amplitudes themselves. The model generally contains a scale factor and may contain parameters describing other experimental effects, such as extinction. Each atom in the unit cell requires three parameters to describe its mean position and various parameters to describe random deviations from that position owing to thermal motion or disorder. Models that treat each atom independently, however, do not allow for the fact that a great deal more is known about a crystal initially than simply its chemical composition. Atoms have fairly definite sizes and tend to occupy sites whose surroundings conform to a rather limited set of common configurations. In this chapter, we discuss ways of using this additional information. First, we shall discuss the use of constraints to reduce the number of parameters that must be varied and account for relationships among parameters that are dictated by the laws of chemistry and physics. Then we shall discuss the use of restraints, which effectively add to the number of observations that must be fitted by the model.
The techniques of least squares are applicable for refining almost any model, but the question of the suitability of the model remains. The addition of parameters may reduce the residual disagreement, but lead to solutions that have no physical or chemical validity. Addition of constraints is one method of constricting the solutions.
The classical technique for application of constraints is the use of Lagrange undetermined multipliers, in which the set of p parameters, , is augmented by
additional unknowns, λk, one for each constraint relationship desired. The problem may be stated in the form: find the minimum of
subject to the condition
This may be shown (Gill, Murray & Wright, 1981
) to be equivalent to the problem: find a point at which the gradient of
vanishes. Solving for the stationary point leads to a set of simultaneous equations of the form
and
Thus, the number of equations, and the number of unknowns, is increased from p to 2p − q. In cases where the number of constraint relations is small, and where it may be difficult to solve the relations for some of the parameters in terms of the rest, this method yields the desired results without too much additional computation (Ralph & Finger, 1982
). With the large numbers of parameters, and large numbers of constraints, that arise in many crystallographic problems, however, the use of Lagrange multipliers is computationally inefficient and cumbersome.
In most cases encountered in crystallography, constraints may be applied directly, thus reducing rather than increasing the size of the normal-equations matrix. For each constraint introduced, one of the parameters becomes dependent on the remaining set, and the rank of the remaining system is reduced by one. For p parameters and p − q constraints, the problem reduces to q parameters. If the Gauss–Newton algorithm is used (Section 8.1.4
), the normal-equations matrix is ATWA, where
and W is a weight matrix. A constrained model,
, maybe constructed using relations of the form
Applying the chain rule for differentiation, the normal-equations matrix for the constrained model is BTWB, where
This may be written in matrix form B = AC, where
defines a
constraint matrix. The application of constraints involves (a) determination of the model to be used, (b) calculation of the elements of C, and (c) computation of the modified normal-equations matrix.
The construction of matrix C by a procedure known as the variable reduction method may be presented formally as follows: Designate by Z the matrix whose elements are and partition Z in the form Z = (U, V), where V is composed of (p − q) columns of Z chosen to be linearly independent, so that V is nonsingular. [V is shown as the last (p − q) columns only for convenience. Any linearly independent set may be chosen.] The rows of Z form a basis for a (p − q)-dimensional subspace of the p-dimensional parameter space, and we wish to construct a basis for z, a q-dimensional subspace that is orthogonal to it, so that all shifts within that subspace starting at a point where the constraints are satisfied, a feasible point, leave the values of the constraint relations unchanged. This basis is used for the columns of C, which is given by
In this formulation, the columns of V correspond to dependent parameters that are functions of the independent parameters corresponding to the columns of U.
Most existing programs provide for the calculation of the structure factor, F, and its partial derivatives with respect to a conventional set of parameters, including occupancy, position, isotropic or anisotropic atomic displacement factors, and possibly higher cumulants of an atomic density function (Prince, 1994). The constrained calculation is usually performed by evaluating selected elements,
. Because the constraint matrix is often extremely sparse, calculation of a limited sum involving only the nonzero elements is usually computationally superior to a full matrix multiplication. After adjustment of the z's, equations (8.3.1.5)
are used to update the parameters. Using this procedure, it is not necessary to express the structure factor, or its derivatives, in terms of the refined parameters. This is particularly important when the constrained model involves arbitrary molecular shapes or rigid-body thermal motions.
The need for constraints arises most frequently when the crystal structure contains atoms in special positions. Here, certain parameters will be constant or linearly related to others. If a parameter is constrained to be a constant, the corresponding row of C will contain zeros, and that column will be ignored. When parameters are linearly dependent on others, which may occur in trigonal, hexagonal, tetragonal and cubic space groups, the modification indicated in (8.3.1.6) cannot be avoided. The constraint relationships among position parameters are trivial. Levy (1956
) described an algorithm for determining the constraints that pertain to second and higher cumulants in the structure-factor formula. Table 8.3.1.1
is a summary of relations that are found for anisotropic atomic displacement factors, with a listing of the space groups in which they occur. Johnson (1970
) provides a table listing the number of unique coefficients for each possible site symmetry for tensors of various ranks, which is useful information for verification of constraint relationships.
|
Another important use of constraints applies to the occupancies of certain sites in the crystal where, for example, a molecule is disordered in two or more possible orientations or (very common in minerals) several elements are distributed among several sites. In both cases, refinement of all of the fractional occupancies tends to be extremely ill conditioned, because of high correlations between occupancies and atomic displacement parameters. The overall chemistry, however, may be known from electron microprobe (Finger, 1969) or other analytic techniques to much better precision than it is possible to determine it using diffraction data alone. The constraining equations for the occupancies of n species in m sites have the form
where
is the multiplicity of the ith site,
is the fractional occupancy of the jth species in the ith site, and p is the total number of atoms of species j per unit cell. For a given crystal structure and composition, the bs and ps are known, and, furthermore, it is possible to write an additional constraint for the total occupancy of each site,
If necessary, vacancies may be included as one of the n species present. In theory, (8.3.1.9)
and (8.3.1.10)
could be solved for (n − 1) × (m − 1) unknown parameters,
, with m + n − 1 constraint relations, but, in practice, at most one occupancy factor per site can be refined. When constraints are applied, the correlations between occupancies and displacement factors are greatly reduced.
In the analysis of a crystal structure, it may be desirable to test various constraints on the shape or symmetry of a molecule. For example, the molecule of a particular compound may have orthorhombic symmetry in the liquid or vapour phase, but crystallize with a monoclinic or triclinic space group. Without constraints, it is impossible to determine whether the crystallization has caused changes in the molecular conformation. Residual errors in the observations will invariably lead to deviations from the original molecular geometry, but these may or may not be meaningful.
With molecular-shape constraints, it is possible to constrain the geometry to any desired conformation. The first step is to describe the molecule in a special, orthonormal coordinate system that has a well defined relationship between the coordinate axes and the symmetry elements. If this system is properly chosen, the description of the molecule is easy. The next step is to describe the transformation between this orthonormal system and the crystallographic axes. A standard, orthonormal coordinate system (Prince, 1994) can be constructed with its x axis parallel to a and its z axis parallel to c*. If the special system is translated with respect to the standard system so that they share a common origin, Eulerian angles, ω, χ, and φ, may be used to define a matrix that rotates the special coordinates into the standard system. Angle ω is defined as the clockwise rotation through which the special system must be rotated about the z axis of the standard system to bring the z axis of the special system into the x, z plane of the standard system. Similarly, angle χ is the clockwise angle through which the resulting, special system must be rotated about the y axis of the standard system to bring the z axes into coincidence, and, finally, angle φ is the clockwise angle through which the special system must be rotated about the common z axes to bring the other axes into coincidence. The overall transformation is given by
The overall transformation of a vector, x′, from the special coordinate system to the crystallographic system is given by where t is the origin offset between the two systems and D is the upper triangular Cholesky factor (Subsection 8.1.1.1
) of the metric tensor, G, which is defined by
Equations (8.3.1.12)
are the constraint relationships, and the refinable parameters include the adjustable parameters in the special system, the origin offset, and the three rotation angles. This set of parameters, although it is written in a very different manner, is a linear transformation of a subset of the conventional crystallographic parameters, so that statistical tests based on the F ratio or Hamilton's R ratio (Section 8.4.2
; Hamilton, 1964
) may be used to assess significance. Shape constraints differ from those owing to space group or chemical conditions in that the constraint equations (8.3.1.12)
are not linear functions of the independent parameters. Thus, the elements of C are not constants and must therefore be evaluated in each iteration of the refinement algorithm.
Another area in which application of constraints is important arises whenever some portion of the structure undergoes thermal motion as a rigid entity. One means of determining rigid-motion parameters is to refine the conventional, anisotropic atomic displacement factors of all atoms individually and to fit a librational model to the resulting thermal factors (Schomaker & Trueblood, 1968). A problem arises with this approach because the presence of libration implies curvilinear motion in the crystallographic system, and thus the probability density function for an atom that does not lie on the axis of libration cannot be described by a Gaussian function in a rectilinear coordinate system. For neutron diffraction, where H atoms have major scattering power, the effect may be large enough to affect convergence (Prince, Dickens & Rush, 1974
). Anharmonic (third-cumulant) terms could be used, but the number of parameters increases rapidly, because there are as many as ten, independent, third cumulant tensor elements per atom.
Thermal motions of rigid bodies are represented by a symmetric, translation tensor, T, a symmetric, libration tensor, L, and a nonsymmetric, screw correlation tensor, S (Cruickshank, 1961; Schomaker & Trueblood, 1968
). Any sequence of rotations of a rigid body about a fixed point is equivalent to a single, finite rotation about some axis passing through the fixed point. This rotation can be represented (Prince, 1994
) by an axial vector,
, where |
| is the magnitude of the rotation, and the direction cosines of the axis with respect to some system of orthogonal axes are given by
. An exact expression for the displacement, u, of a point in the rigid body, located by a vector r from the centre of mass, owing to a rotation
about an axis passing through the centre of mass is
For small rotations, the trigonometric functions can be replaced by power-series expansions, and, because of the extremely rapid convergence of these series, (8.3.1.14)
is approximated extremely well, even for values of |
| as large as 0.5 rad, by
By expansion of the vector products, this can be written
where the coefficients,
,
,
, and
are multiples of components of r. For example,
, so that A(r)11 = 0, A(r)12 = r3, and A(r)13 = −r2. These coefficients have been tabulated by Sygusch (1976
), and expressed in Fortran source code by Prince (1994
).
If the centre of mass of the rigid body also moves, the total displacement of the point at r is v = u + t, where t is the displacement of the centre of mass from its equilibrium position. A discussion of the effects of rigid-body motion on diffraction intensities involves quantities like ,
, and so forth, the ensemble averages of these quantities over many unit cells, which may be assumed to be equal to the time averages for one unit cell over a long time. The rigid-body-motion tensors are defined by
=
,
=
, and
=
. The distributions of
and
can usually be assumed to be approximately Gaussian, so that fourth moments can be expressed in terms of second moments. Thus,
=
,
=
, and so forth. If the elements of t and
are measured with respect to their mean positions,
=
= 0. Third moments, quantities like
, do not necessarily vanish, except when the rigid body is centrosymmetric, but their effects virtually always are small, and can be neglected.
A particle that is part of a librating, rigid body undergoes a curvilinear motion that results in its having a mean position that is displaced from its equilibrium position. This causes an apparent shortening of interatomic distances, which must be corrected for if accurate values of bond lengths are to be derived. The displacement, d, from the equilibrium position to the mean position is (Prince & Finger, 1973)
Anisotropic atomic displacement factors, ,
, or
, are related by simple, linear transformations that are functions of the unit-cell constants to the quantity
. If the rigid body has a centre of symmetry, so that the elements of S vanish, this is given by
Expressions including elements of S have been given by Sygusch (1976) and, in Fortran source code, by Prince (1994
). Expressions for anisotropic atomic displacement factors in terms of T, L, and S that included only terms linear in the tensor elements were given by Schomaker & Trueblood (1968
), who pointed out that the diagonal elements of S never appeared individually, but only as the differences of pairs, so that the expressions were invariant under the addition of a constant to all three elements. This `trace of S singularity' was resolved by applying the additional constraint S11 + S22 + S33 = 0. As was pointed out by Sygusch (1976
), the inclusion of terms that are quadratic in the tensor elements removes this indeterminacy, but the effects of the additional terms are so small that the problem remains extremely ill conditioned. In practice, therefore, these elements should still be treated as underdetermined.
Prince (1994) lists the symmetry restrictions for each type of tensor for various point groups. Although the description of thermal motion is essentially harmonic within the rigid-body system, the structure-factor formulation must include what appear to be anharmonic terms. Prince also presents computer routines that contain the relations between the elements of T, L, and S and the second- and third-cumulant tensor elements. As in the case of shape constraints, the equations are nonlinear, and the elements of C must be re-evaluated in each iteration.
The precision with which an approximately correct model can be refined to describe the atomic structure of a crystal depends on the ability of the model to represent the atomic distributions and on the quality of the observational data being fitted with the model. In addition, although the structure can in principle be determined by a well chosen data set only a little larger than the number of parameters to be determined (Section 8.4.4
), in practice, with a nonlinear model as complex as that for a macromolecular crystal, it is necessary for the parameters defining the model to be very much over-determined by the observations. For well ordered crystals of small- and intermediate-sized molecules, it is usually possible to measure a hundred or more independent Bragg reflections for each atom in the asymmetric unit. When the model contains three position parameters and six atomic displacement parameters for each atom, the over-determinacy ratio is still greater than ten to one. In such instances, each model parameter can usually be quite well determined, and will provide an accurate representation of the average structure in the crystal, except in regions where ellipsoids are not adequate descriptions of the atomic distributions. This contrasts sharply with studies of biological macromolecules, in which positional disorder and thermal motion in large regions, if not the entire molecule, often limit the number of independent reflections in the data set to fewer than the number of parameters necessary to define the distributions of individual atoms. This problem may be overcome either by reducing the number of parameters describing the model or by increasing the number of independent observations. Both approaches utilize knowledge of stereochemistry.
A great deal of geometrical information with which an accurate model must be consistent is available at the onset of a refinement. The connectivity of the atoms is generally known, either from the approximately correct Fourier maps of the electron density obtained from a trial structure determination or from sequencing studies of the molecules. Quite tight bounds are placed on local geometry by the accumulating body of information concerning bond lengths, bond angles, group planarity, and conformational preferences in torsion angles. Additional knowledge concerns van der Waals contact potential functions and hydrogen-bonding properties, and displacement factors must also be correlated in a manner consistent with the known geometry. In Section 8.3.1, we discuss the use of constraints to introduce this stereochemical knowledge. In this section, we discuss a technique that introduces the stereochemical conditions as additional observational equations (Waser, 1963
). This method differs from the other in that information is introduced in the form of distributions about mean values rather than as rigidly fixed geometries. The parameters are restrained to fall within energetically permissible bounds.
As described in Section 8.1.2
, given a set of observations,
, that can be described by model functions, Mi(x), where x is the vector of model parameters, we seek to find x for which the sum
is minimum. For restrained refinement, S is composed of several classes of observational equations, including, in addition to the ones for structure factors, equations for interatomic distances, planar groups and displacement factors.
Structure factors yield terms in the sum of the form The distances between bonded atoms and between next-nearest-neighbour atoms may be used to require bonded distances and angles to fall within acceptable ranges. This gives terms of the form
where σd is the standard deviation of an empirically determined distribution of values for distances of that type. Groups of atoms may be restrained to be near a common plane by terms of the form (Schomaker, Waser, Marsh & Bergman, 1959
)
where
and
are parameters of the plane, σp is again an empirically determined standard deviation, and · indicates the scalar product.
If a molecule undergoes thermal oscillation, the displacement parameters of individual atoms that are stereochemically related must be correlated. These parameters may be required to be consistent with the known stereochemistry by assuming a model that gives a distribution function for the interatomic distances in terms of the individual atom parameters and then restraining the variance of that distribution function to a suitably small value. The variation with time of the distances between covalently bonded atoms can be no greater than a few hundredths of an ångström. Therefore, the thermal displacements of bonded atoms should be very similar along the bond direction, but they may be more dissimilar perpendicular to the bond. If we make the assumption that the atom with a broader distribution in a given direction is `riding' on the atom with the narrower distribution, the variance of the interatomic distance parallel to a vector v making an angle with the direction of bond j is (Konnert & Hendrickson, 1980
)
where
is the normal distance for that type of bond,
=
, and
and
are the mean square displacements parallel to v of atom a and atom b, respectively. The restraint terms then have the form
. For isotropic displacement factors, these terms take the particularly simple form
, but with the disadvantage that, when isotropic displacement parameters are used, the displacements cannot be suitably restrained along the bonds and perpendicular to the bonds simultaneously.
Several additional types of restraint term have proved useful in restraining the coordinates for the mean positions of atoms in macromolecules. Among these are terms representing nonbonded contacts, torsion angles, handedness around chiral centres, and noncrystallographic symmetry (Hendrickson & Konnert, 1980; Jack & Levitt, 1978
; Hendrickson, 1985
). Contacts between nonbonded atoms are important for determining the conformations of folded chain molecules. They may be described by a potential function that is strongly repulsive when the interatomic distance is less than some minimum value, but only weakly attractive, so that it can be neglected in practice, when the distance is greater than that value. This leads to terms of the form
which are included only when
. Macromolecules usually gain flexibility by relatively unrestricted rotation about single bonds. There are, nevertheless, significant restrictions on these torsion angles, which may, therefore, be restrained by terms of the form
where
and
are dihedral angles between planar groups at opposite ends of the bond.
Interatomic distances are independent of the handedness of an enantiomorphous group. If rc is the position vector of a central atom and ,
, and
are the positions of three atoms bonded to it, such that the four atoms are not coplanar, the chiral volume is defined by
where × indicates the vector product. The chiral volume may be either positive or negative, depending on the handedness of the group. It may be restrained by including terms of the form
Table 8.3.2.1 gives ideal coordinates, in an orthonormal coordinate system measured in Å, of various groups that are commonly found in proteins. The ideal conformations of pairs of amino acid residues, from which the ideal values to be used in restraint terms of various types may be determined, are constructed by combining the coordinates of the individual groups. For example, consider a dipeptide composed of glycine and alanine joined by a trans peptide link, giving the molecule
The origin is placed at each of the Cα positions in turn, and interatomic distances to nearest and next-nearest neighbours are computed. Planar groups and possible nonbonded contacts are identified, and torsion angles and chiral volumes for chiral centres are computed. Table 8.3.2.2
is a summary of the restraint information for this simple molecule. In order to incorporate this information in the refinement, these ideal values are combined with suitable weights. Table 8.3.2.3
gives values of the standard deviations of the various types of constraint relation that have been found (Hendrickson, 1985
) to give good results in practice.
|
|
|
Even for a small protein, the normal-equations matrix may contain several million elements. When stereochemical restraint relations are used, however, the matrix elements are not equally important, and many may be neglected. Convergence and stability properties can be preserved when only those elements that are different from zero for the stereochemical restraint information are retained. The number of these elements increases linearly with the number of atoms, and is typically less than 1% of the total in the matrix, so that sparse-matrix methods (Section 8.1.5
) can be used. The method of conjugate gradients (Hestenes & Stiefel, 1952
; Konnert, 1976
; Rae, 1978
) is particularly suitable for the efficient use of restrained-parameter least squares.
References





















