Molecular model building

Millane, R. P.

doi:10.1107/97809553602060000567

International
Tables for
Crystallography
Volume B
Reciprocal space
Edited by U. Shmueli

pdf | chapter contents | chapter index | related articles

International Tables for Crystallography (2006). Vol. B. ch. 4.5, pp. 476-477 | 1 | 2 |

Section 4.5.2.6.4. Molecular model building

R. P. Millane^a ^*

4.5.2.6.4. Molecular model building

| top | pdf |

The majority of the structures determined by X-ray fibre diffraction analysis have been determined by molecular model building (Campbell Smith & Arnott, 1978; Arnott, 1980; Millane, 1988). Most applications of molecular model building have been to polycrystalline systems, although there have been a number of applications to noncrystalline systems (Park et al., 1987; Millane et al., 1988). The approach is to use spacings and symmetry information derived directly from the diffraction pattern, coupled with the primary structure and stereochemical information on the molecule under study, to construct models of all kinds of possible molecular or crystal structure. These models are each refined (optimized) against the diffraction data, as well as stereochemical restraints, to produce the best model of each kind. The optimized models can be compared using various figures of merit, and in favourable cases one model will be sufficiently superior to the remainder for it to represent unequivocally the correct structure. The principle of this approach is that by making use of stereochemical constraints, the molecular and crystal structure have few enough degrees of freedom that the parameter space has a sufficiently small number of local minima for these to be identified and individually examined to find the global minimum. The X-ray phases are therefore not determined explicitly.

There are three steps involved in structure determination by molecular model building: (1) construction of all possible molecular and crystal structure models, (2) refinement of each model against the X-ray data and stereochemical restraints, and (3) adjudication among the refined models. The overall procedure for determining polymer structures using molecular model building is summarized by the flow chart in Fig. 4.5.2.2, and is described below.

Figure 4.5.2.2 | top | pdf |

Flow chart of the molecular-model-building approach to structure determination (Arnott, 1980).

The helix symmetry of the molecule, or one of a few helix symmetries, can be determined as described in Section 4.5.2.6.2. Different kinds of molecular model may correspond to one of a few different helix symmetries, usually corresponding to different values of v. For example, helix symmetries $[u_{v}]$ and $[u_{u-v}]$ , which correspond to the left- and right-handed helices, cannot be distinguished on the basis of the overall intensity distribution alone. Other examples of different kinds of molecular model may include single, double or multiple helices, parallel or antiparallel double helices, different juxtapositions of chains within multiple helices and different conformational domains within the molecule. For polycrystalline systems, in addition to different kinds of molecular structures, there are often different kinds of possible packing arrangements within the unit cell. There may be a number of possible packings which correspond to different arrangements within the crystallographic asymmetric unit, and there may be more than one space group that needs to be considered.

Despite the apparent large number of potential starting models implied by the above discussion, in practice the number of feasible models is usually quite small, and many of these are often eliminated at an early stage. Definition and refinement of helical polymers [steps (1) and (2) above] are carried out using computer programs, the most popular and versatile being the linked-atom least-squares (LALS) system (Campbell Smith & Arnott, 1978; Millane et al., 1985), originally developed by Arnott and co-workers in the early 1960s (Arnott & Wonacott, 1966). This system has been used to determine the structures of a wide variety of polynucleotides, polysaccharides, polyesters and polypeptides (Arnott, 1980; Arnott & Mitra, 1984; Chandrasekaran & Arnott, 1989; Millane, 1990c). Other refinement systems exist (Zugenmaier & Sarko, 1980; Iannelli, 1994), but the principles are essentially the same and the following discussion is in terms of the LALS system. The atomic coordinates are defined, using a linked-atom description, in terms of bond lengths, bond angles and conformation (torsion) angles (Campbell Smith & Arnott, 1978). Stereochemical constraints are imposed, and the number of parameters reduced, by fixing the bond lengths, often (but not always) the bond angles, and possibly some of the conformation angles. The molecular conformation is then defined by the remaining parameters. For polycrystalline systems, there are usually additional variable parameters that define the packing of the molecule(s) in the unit cell. A further source of stereochemical data is the requirement that a model exhibit no over-short nonbonded interatomic distances. These are incorporated by a quadratic nonbonded potential that is matched to a Buckingham potential (Campbell Smith & Arnott, 1978). A variety of other restraints can also be incorporated.

In the LALS system, the quantity Ω given by $[\Omega = {\textstyle\sum\limits_{m}} \omega_{m} \Delta F_{m}^{2} + {\textstyle\sum\limits_{m}} k_{m} \Delta d_{m}^{2} + {\textstyle\sum\limits_{m}} \lambda_{m} G_{m} = X + C + L \eqno(4.5.2.62)]$ is minimized by varying a set of chosen parameters consisting of conformation angles, possibly bond angles, and packing parameters. The term X involves the differences $[\Delta F_{m}]$ between the model and experimental X-ray amplitudes – Bragg and/or continuous. The term C involves restraints to ensure that over-short nonbonded interatomic distances are driven beyond acceptable minimum values, that conformations are within desired domains, that hydrogen-bond and coordination geometries are close to the expected configurations, and a variety of other relationships are satisfied (Campbell Smith & Arnott, 1978). The $[\omega_{m}]$ and $[k_{m}]$ are weights that are inversely proportional to the estimated variances of the data. The term L involves constraints which are relationships that are to be satisfied exactly $[(G_{m} = 0)]$ and the $[\lambda_{m}]$ are Lagrange multipliers. Constraints are used, for example, to ensure connectivity from one helix pitch to the next and to ensure that chemical ring systems are closed. The cost function Ω is minimized using full-matrix nonlinear least squares and singular value decomposition (Campbell Smith & Arnott, 1978).

Structure determination usually involves first using equation (4.5.2.62) with the terms C and L only, to establish the stereochemical viability of each kind of possible molecular model and packing arrangement. It is worth emphasizing that it is usually advantageous if the specimen is polycrystalline, even though the continuous diffraction contains, in principle, more information than the Bragg reflections (since the latter are sampled). This is because the molecule in a noncrystalline specimen must be refined in steric isolation, whereas for a polycrystalline specimen it is refined while packed in the crystal lattice. The extra information provided by the intermolecular contacts can often help to eliminate incorrect models. This can be particularly significant if the molecule has flexible sidechains. The initial models that survive the steric optimization are then optimized also against the X-ray data, by further refinement with X included in equation (4.5.2.62). The ratios $[(\Omega_{P}/\Omega_{Q})^{1/2}]$ and $[(X_{P}/X_{Q})^{1/2}]$ can be used in Hamilton's test (Hamilton, 1965) to evaluate the differences between models P and Q. On the basis of these statistical tests, one can decide if one model is superior to the others at an acceptable confidence level. In the final stages of refinement, bond angles may be varied in a `stiffly elastic' fashion from their mean values if there are sufficient data to justify the increase in the number of degrees of freedom.

If sufficient X-ray data are available, it is sometimes possible to locate additional ordered molecules such as counterions or solvent molecules by difference Fourier synthesis as described in Section 4.5.2.6.5. Their positions can then be co-refined with the polymer structure while hydrogen bonds and coordination geometries are optimized. The resulting structure can then be used to compute improved phases to search for additional molecules. Since the signal-to-noise ratio in fibre difference syntheses is usually low, difference maps must be interpreted with caution. The assignment of counterions or solvent molecules to peaks in the difference synthesis must be supported by plausible interactions with the rest of the structure and, following refinement of the structure, by elimination of the peak in the difference map and by a significant improvement in the agreement between the calculated and measured X-ray amplitudes.

References

Arnott, S. (1980). Twenty years hard labor as a fibre diffractionist. In Fibre diffraction methods, ACS Symposium Series, Vol. 141, edited by A. D. French & K. H. Gardner, pp. 1–30. Washington DC: American Chemical Society.Google Scholar

Arnott, S. & Mitra, A. K. (1984). X-ray diffraction analyses of glycosamionoglycans. In Molecular biophysics of the extracellular matrix, edited by S. Arnott, D. A. Rees & E. R. Morris, pp. 41–67. Clifton: Humana Press.Google Scholar

Arnott, S. & Wonacott, A. J. (1966). The refinement of the crystal and molecular structures of polymers using X-ray data and stereochemical constraints. Polymer, 7, 157–166.Google Scholar

Campbell Smith, P. J. & Arnott, S. (1978). LALS: a linked-atom least-squares reciprocal-space refinement system incorporating stereochemical constraints to supplement sparse diffraction data. Acta Cryst. A34, 3–11.Google Scholar

Chandrasekaran, R. & Arnott, S. (1989). The structures of DNA and RNA helices in oriented fibres. In Landolt–Bornstein numerical data and functional relationships in science and technology, Vol. VII/1b, edited by W. Saenger, pp. 31–170. Berlin, Heidelberg: Springer-Verlag.Google Scholar

Hamilton, W. C. (1965). Significance tests on the crystallographic R factor. Acta Cryst. 18, 502–510.Google Scholar

Iannelli, P. (1994). FWR: a computer program for refining the molecular structure in the crystalline phase of polymers based on the analysis of the whole X-ray fibre diffraction patterns. J. Appl. Cryst. 27, 1055–1060.Google Scholar

Millane, R. P. (1988). X-ray fibre diffraction. In Crystallographic computing 4. Techniques and new technologies, edited by N. W. Isaacs & M. R. Taylor, pp. 169–186. Oxford University Press.Google Scholar

Millane, R. P. (1990c). Polysaccharide structures: X-ray fibre diffraction studies. In Computer modeling of carbohydrate molecules. ACS Symposium Series No. 430, edited by A. D. French & J. W. Brady, pp. 315–331. Washington DC: American Chemical Society.Google Scholar

Millane, R. P., Byler, M. A. & Arnott, S. (1985). Implementing constrained least squares refinement of helical polymers on a vector pipeline machine. In Supercomputer applications, edited by R. W. Numrich, pp. 137–143, New York: Plenum.Google Scholar

Millane, R. P., Chandrasekaran, R., Arnott, S. & Dea, I. C. M. (1988). The molecular structure of kappa-carrageenan and comparison with iota-carrageenan. Carbohydr. Res. 182, 1–17.Google Scholar

Park, H., Arnott, S., Chandrasekaran, R., Millane, R. P. & Campagnari, F. (1987). Structure of the α-form of poly(dA)·poly(dT) and related polynucleotide duplexes. J. Mol. Biol. 197, 513–523.Google Scholar

Zugenmaier, P. & Sarko, A. (1980). The variable virtual bond. In Fibre diffraction methods, ACS Symposium Series Vol. 141, edited by A. D. French & K. H. Gardner, pp. 225–237. Washington DC: American Chemical Society.Google Scholar

International Tables for Crystallography (2006). Vol. B. ch. 4.5, pp. 476-477