Other multisolution methods applied to small molecules

Giacovazzo, C.

doi:10.1107/97809553602060000555

International
Tables for
Crystallography
Volume B
Reciprocal space
Edited by U. Shmueli

pdf | chapter contents | chapter index | related articles

International Tables for Crystallography (2006). Vol. B. ch. 2.2, pp. 228-230 | 1 | 2 |

Section 2.2.8. Other multisolution methods applied to small molecules

C. Giacovazzo^a ^*

^aDipartimento Geomineralogico, Campus Universitario, I-70125 Bari, Italy
Correspondence e-mail: c.giacovazzo@area.ba.cnr.it

2.2.8. Other multisolution methods applied to small molecules

| top | pdf |

In very complex structures a large initial set of known phases seems to be a basic requirement for a structure to be determined. This aim can be achieved, for example, by introducing a large number of permutable phases into the initial set. However, the introduction of every new symbol implies a fourfold increase in computing time, which, even in fast computers, quickly leads to computing-time limitations. On the other hand, a relatively large starting set is not in itself enough to ensure a successful structure determination. This is the case, for example, when the triplet invariants used in the initial steps differ significantly from zero. New strategies have therefore been devised to solve more complex structures.

(1) Magic-integer methods

In the classical procedure described in Section 2.2.7, the unknown phases in the starting set are assigned all combinations of the values $[\pm \pi / 4, \pm 3 \pi / 4]$ . For n unknown phases in the starting set, $[4^{n}]$ sets of phases arise by quadrant permutation; this is a number that increases very rapidly with n. According to White & Woolfson (1975), phases can be represented for a sequence of n integers by the equations $[\varphi_{i} = m_{i}x \ (\hbox{mod } 2 \pi), \quad i = 1, \ldots, n. \eqno(2.2.8.1)]$ The set of equations can be regarded as the parametric equation of a straight line in n-dimensional phase space. The nature and size of errors connected with magic-integer representations have been investigated by Main (1977) who also gave a recipe for deriving magic-integer sequences which minimize the r.m.s. errors in the represented phases (see Table 2.2.8.1). To assign a phase value, the variable x in equation (2.2.8.1) is given a series of values at equal intervals in the range $[0\lt x\lt 2 \pi]$ . The enantiomorph is defined by exploring only the appropriate half of the n-dimensional space.

Table 2.2.8.1 | top | pdf |
Magic-integer sequences for small numbers of phases (n) together with the number of sets produced and the root-mean-square error in the phases

n	Sequence								No. of sets	R.m.s. error $[(^{\circ})]$
1	1								4	26
2	2	3							12	29
3	3	4	5						20	37
4	5	7	8	9					32	42
5	8	11	13	14	15				50	45
6	13	18	21	23	24	25			80	47
7	21	29	34	37	39	40	41		128	48
8	34	47	55	60	63	65	66	67	206	49

A different way of using the magic-integer method (Declercq et al., 1975) is the primary–secondary P–S method which may be described schematically in the following way:

(a) Origin- and enantiomorph-fixing phases are chosen and some one-phase s.s.'s are estimated.
(b) Nine phases [this is only an example: very long magic-integer sequences may be used to represent primary phases (Hull et al., 1981; Debaerdemaeker & Woolfson, 1983)] are represented with the approximated relationships: $[\cases{\varphi_{i_{1}} = 3 x\cr \varphi_{i_{2}} = 4 x\cr \varphi_{i_{3}} = 5 x\cr}\qquad \cases{\varphi_{j_{1}} = 3 y\cr \varphi_{j_{2}} = 4 y\cr \varphi_{j_{3}} = 5 y\cr}\qquad \cases{\varphi_{p_{1}} = 3 z\cr \varphi_{p_{2}} = 4 z\cr \varphi_{p_{3}} = 5 z.\cr}]$ Phases in (a) and (b) consistitute the primary set.
(c) The phases in the secondary set are those defined through $[\sum_{2}]$ relationships involving pairs of phases from the primary set: they, too, can be expressed in magic-integer form.
(d) All the triplets that link together the phases in the combined primary and secondary set are now found, other than triplets used to obtain secondary reflections from the primary ones. The general algebraic form of these triplets will be $[m_{1}x + m_{2}y + m_{3}z + b \equiv 0\ (\hbox{mod } 1),]$ where b is a phase constant which arises from symmetry translation. It may be expected that the `best' value of the unknown x, y, z corresponds to a maximum of the function $[\psi (x, y, z) = \textstyle\sum |E_{1} E_{2} E_{3}| \cos 2 \pi (m_{1}x + m_{2}y + m_{3}z + b),]$ with $[0\leq x, y, z\lt 1]$ . It should be noticed that ψ is a Fourier summation which can easily be evaluated. In fact, ψ is essentially a figure of merit for a large number of phases evaluated in terms of a small number of magic-integer variables and gives a measure of the internal consistency of $[\sum_{2}]$ relationships. The ψ map generally presents several peaks and therefore can provide several solutions for the variables.

(2) The random-start method

These are procedures which try to solve crystal structures by starting from random initial phases (Baggio et al., 1978; Yao, 1981). They may be so described:

(a) A number of reflections (say NUM ∼ 100 or larger) at the bottom of the CONVERGE map are selected. These, and the relationships which link them, form the system for which trial phases will be found.
(b) A pseudo-random number generator is used to generate M sets of NUM random phases. Each of the M sets is refined and extended by the tangent formula or similar methods.

(3) Accurate calculation of s.i.'s and s.s.'s with 1, 2, 3, 4, …, n phases

Having a large set of good phase relationships allows one to overcome difficulties in the early stages and in the refinement process of the phasing procedure. Accurate estimates of s.i.'s and s.s.'s may be achieved by the application of techniques such as the representation method or the neighbourhood principle (Hauptman, 1975; Giacovazzo, 1977a, 1980b). So far, second-representation formulae are available for triplets and one-phase seminvariants; in particular, reliably estimated negative triplets can be recognized, which is of great help in the phasing process (Cascarano, Giacovazzo, Camalli et al., 1984). Estimation of higher-order s.s.'s with upper representations or upper neighbourhoods is rather difficult, both because the procedures are time consuming and because the efficiency of the present joint probability distribution techniques deteriorates with complexity. However, further progress can be expected in the field.
(4) Modified tangent formulae and least-squares determination and refinement of phases

The problem of deriving the individual phase angles from triplet relationships is greatly overdetermined: indeed the number of triplets, in fact, greatly exceeds the number of phases so that any $[\varphi_{\bf h}]$ may be determined by a least-squares approach (Hauptman et al., 1969). The function to be minimized may be $[{M} = {{\textstyle\sum_{\bf k}} w_{\bf k}[\cos (\varphi_{\bf h} - \varphi_{\bf k} - \varphi_{{\bf h}-{\bf k}}) - C_{\bf k}]^{2}\over \sum w_{\bf k}},]$ where $[C_{\bf k}]$ is the estimate of the cosine obtained by probabilistic or other methods.

Effective least-squares procedures based on linear equations (Debaerdemaeker & Woolfson, 1983; Woolfson, 1977) can also be used. A triplet relationship is usually represented by $[(\varphi_{p} \pm \varphi_{q} \pm \varphi_{r} + b) \approx 0\ (\hbox{mod } 2 \pi), \eqno(2.2.8.2)]$ where b is a factor arising from translational symmetry. If (2.2.8.2) is expressed in cycles and suitably weighted, then it may be written as $[w (\varphi_{p} \pm \varphi_{q} \pm \varphi_{r} + b) = wn,]$ where n is some integer. If the integers were known then the equation would appear (in matrix notation) as $[{\bi A}\boldPhi = {\bi C}, \eqno(2.2.8.3)]$ giving the least-squares solution $[{\boldPhi } = ({\bi A}^{T}{\bi A})^{-1} {\bi A}^{T}{\bi C}. \eqno(2.2.8.4)]$ When approximate phases are available, the nearest integers may be found and equations (2.2.8.3) and (2.2.8.4) constitute the basis for further refinement.

Modified tangent procedures are also used, such as (Sint & Schenk, 1975; Busetta, 1976) $[\tan \varphi_{\bf h} \simeq {{\textstyle\sum_{j}} G_{{{\bf h}, \, {\bf k}}_{j}} \sin (\varphi_{{\bf k}_{j}} + \varphi_{{\bf h}-{\bf k}_{j}} - \Delta_{j})\over \sum G_{{{\bf h}, \, {\bf k}}_{j}} \cos (\varphi_{{\bf k}_{j}} + \varphi_{{\bf h}-{\bf k}_{j}} - \Delta_{j})},]$ where $[\Delta_{j}]$ is an estimate for the triplet phase sum $[(\varphi_{\bf h} - \varphi_{{\bf k}_{j}} - \varphi_{{\bf h}-{\bf k}_{j}})]$ .
(5) Techniques based on the positivity of Karle–Hauptman determinants

(The main formulae have been briefly described in Section 2.2.5.7.) The maximum determinant rule has been applied to solve small structures (de Rango, 1969; Vermin & de Graaff, 1978) via determinants of small order. It has, however, been found that their use (Taylor et al., 1978) is not of sufficient power to justify the larger amount of computing time required by the technique as compared to that required by the tangent formula.
(6) Tangent techniques using simultaneously triplets, quartets,…

The availability of a large number of phase relationships, in particular during the first stages of a direct procedure, makes the phasing process easier. However, quartets are sums of two triplets with a common reflection. If the phase of this reflection (and/or of the other cross terms) is known then the quartet probability formulae described in Section 2.2.5.5 cannot hold. Similar considerations may be made for quintet relationships. Thus triplet, quartet and quintet formulae described in the preceding paragraphs, if used without modifications, will certainly introduce systematic errors in the tangent refinement process.

A method which takes into account correlation between triplets and quartets has been described (Giacovazzo, 1980c) [see also Freer & Gilmore (1980) for a first application], according to which $[\tan \varphi_{\bf h} \simeq {{\textstyle\sum\limits_{\bf k}} G \sin (\varphi_{\bf k} + \varphi_{{\bf h}-{\bf k}}) - {\textstyle\sum\limits_{{{\bf k}, \, {\bf l}}}} G' \sin (\varphi_{\bf k} + \varphi_{\bf l} + \varphi_{{\bf h}-{\bf k}-{\bf l}})\over {\textstyle\sum\limits_{\bf k}} G \cos (\varphi_{\bf k} + \varphi_{{\bf h}-{\bf k}}) - {\textstyle\sum\limits_{{\bf k}, \, {\bf l}}} G' \cos (\varphi_{\bf k} + \varphi_{\bf l} + \varphi_{{\bf h}-{\bf k}-{\bf l}})},]$ where G′ takes into account both the magnitudes of the cross terms of the quartet and the fact that their phases may be known.
(7) Integration of Patterson techniques and direct methods (Egert & Sheldrick, 1985) [see also Egert (1983, and references therein)]

A fragment of known geometry is oriented in the unit cell by real-space Patterson rotation search (see Chapter 2.3 ) and its position is found by application of a translation function (see Section 2.2.5.4 and Chapter 2.3 ) or by maximizing the weighted sum of the cosines of a small number of strong translation-sensitive triple phase invariants, starting from random positions. Suitable FOMs rank the most reliable solutions.

(8) Maximum entropy methods

A common starting point for all direct methods is a stochastic process according to which crystal structures are thought of as being generated by randomly placing atoms in the asymmetric unit of the unit cell according to some a priori distribution. A non-uniform prior distribution of atoms p(r) gives rise to a source of random atomic positions with entropy (Jaynes, 1957) $[H(p) = - \textstyle\int\limits_{V} p({\bf r}) \log p({\bf r}) \;\hbox{d}{\bf r}.]$ The maximum value $[H_{\max} = \log V]$ is reached for a uniform prior $[p({\bf r}) = 1/V]$ .

The strength of the restrictions introduced by p(r) is not measured by [H(p)] but by $[H(p) - H_{\max}]$ , given by $[H(p) - H_{\max} = - \textstyle\int\limits_{V} p({\bf r}) \log [\;p({\bf r})/m({\bf r})] \;\hbox{d}{\bf r},]$ where $[m({\bf r}) = 1/V]$ . Accordingly, if a prior prejudice m(r) exists, which maximizes H, the revised relative entropy is $[S(p) = - \textstyle\int\limits_{V} p({\bf r}) \log [\;p({\bf r})/m({\bf r})] \;\hbox{d}{\bf r}.]$ The maximization problem was solved by Jaynes (1957). If $[G_{j}(p)]$ are linear constraint functionals defined by given constraint functions $[C_{j}({\bf r})]$ and constraint values $[c_{j}]$ , i.e. $[G_{j}(p) = \textstyle\int\limits_{V} p({\bf r})C_{j}({\bf r}) \;\hbox{d}{\bf r} = c_{j},]$ the most unbiased probability density p(r) under prior prejudice m(r) is obtained by maximizing the entropy of p(r) relative to m(r). A standard variational technique suggests that the constrained maximization is equivalent to the unconstrained maximization of the functional $[S(p) + \textstyle\sum\limits_{j} \lambda_{j}G_{j}(p),]$ where the $[\lambda_{j}]$ 's are Lagrange multipliers whose values can be determined from the constraints.

Such a technique has been applied to the problem of finding good electron-density maps in different ways by various authors (Wilkins et al., 1983; Bricogne, 1984; Navaza, 1985; Navaza et al., 1983).

Maximum entropy methods are strictly connected with traditional direct methods: in particular it has been shown that:

(a) the maximum determinant rule (see Section 2.2.5.7) is strictly connected (Britten & Collins, 1982; Piro, 1983; Narayan & Nityananda, 1982; Bricogne, 1984);
(b) the construction of conditional probability distributions of structure factors amounts precisely to a reciprocal-space evaluation of the entropy functional (Bricogne, 1984).

Maximum entropy methods are under strong development: important contributions can be expected in the near future even if a multipurpose robust program has not yet been written.

References

Baggio, R., Woolfson, M. M., Declercq, J.-P. & Germain, G. (1978). On the application of phase relationships to complex structures. XVI. A random approach to structure determination. Acta Cryst. A34, 883–892.Google Scholar

Bricogne, G. (1984). Maximum entropy and the foundation of direct methods. Acta Cryst. A40, 410–415.Google Scholar

Britten, P. L. & Collins, D. M. (1982). Information theory as a basis for the maximum determinant. Acta Cryst. A38, 129–132.Google Scholar

Busetta, B. (1976). An example of the use of quartet and triplet structure invariants when enantiomorph discrimination is difficult. Acta Cryst. A32, 139–143.Google Scholar

Cascarano, G., Giacovazzo, C., Camalli, M., Spagna, R., Burla, M. C., Nunzi, A. & Polidori, G. (1984). The method of representations of structure seminvariants. The strengthening of triplet relationships. Acta Cryst. A40, 278–283.Google Scholar

Debaerdemaeker, T. & Woolfson, M. M. (1983). On the application of phase relationships to complex structures. XXII. Techniques for random refinement. Acta Cryst. A39, 193–196.Google Scholar

Declercq, J.-P., Germain, G. & Woolfson, M. M. (1975). On the application of phase relationships to complex structures. VIII. Extension of the magic-integer approach. Acta Cryst. A31, 367–372.Google Scholar

Egert, E. (1983). Patterson search – an alternative to direct methods. Acta Cryst. A39, 936–940.Google Scholar

Egert, E. & Sheldrick, G. M. (1985). Search for a fragment of known geometry by integrated Patterson and direct methods. Acta Cryst. A41, 262–268.Google Scholar

Freer, A. A. & Gilmore, C. J. (1980). The use of higher invariants in MULTAN. Acta Cryst. A36, 470–475.Google Scholar

Giacovazzo, C. (1977a). A general approach to phase relationships: the method of representations. Acta Cryst. A33, 933–944.Google Scholar

Giacovazzo, C. (1980b). The method of representations of structure seminvariants. II. New theoretical and practical aspects. Acta Cryst. A36, 362–372.Google Scholar

Giacovazzo, C. (1980c). Triplet and quartet relations: their use in direct procedures. Acta Cryst. A36, 74–82.Google Scholar

Hauptman, H. (1975). A new method in the probabilistic theory of the structure invariants. Acta Cryst. A31, 680–687.Google Scholar

Hauptman, H., Fisher, J., Hancock, H. & Norton, D. A. (1969). Phase determination for the estriol structure. Acta Cryst. B25, 811–814.Google Scholar

Hull, S. E., Viterbo, D., Woolfson, M. M. & Shao-Hui, Z. (1981). On the application of phase relationships to complex structures. XIX. Magic-integer representation of a large set of phases: the MAGEX procedure. Acta Cryst. A37, 566–572.Google Scholar

Jaynes, E. T. (1957). Information theory and statistical mechanics. Phys. Rev. 106, 620–630.Google Scholar

Main, P. (1977). On the application of phase relationships to complex structures. XI. A theory of magic integers. Acta Cryst. A33, 750–757.Google Scholar

Narayan, R. & Nityananda, R. (1982). The maximum determinant method and the maximum entropy method. Acta Cryst. A38, 122–128.Google Scholar

Navaza, J. (1985). On the maximum-entropy estimate of the electron density function. Acta Cryst. A41, 232–244.Google Scholar

Navaza, J., Castellano, E. E. & Tsoucaris, G. (1983). Constrained density modifications by variational techniques. Acta Cryst. A39, 622–631.Google Scholar

Piro, O. E. (1983). Information theory and the phase problem in crystallography. Acta Cryst. A39, 61–68.Google Scholar

Rango, C. de (1969). Thesis. Paris.Google Scholar

Sint, L. & Schenk, H. (1975). Phase extension and refinement in non-centrosymmetric structures containing large molecules. Acta Cryst. A31, S22.Google Scholar

Taylor, D. J., Woolfson, M. M. & Main, P. (1978). On the application of phase relationships to complex structures. XV. Magic determinants. Acta Cryst. A34, 870–883.Google Scholar

Vermin, W. J. & de Graaff, R. A. G. (1978). The use of Karle–Hauptman determinants in small-structure determinations. Acta Cryst. A34, 892–894.Google Scholar

White, P. & Woolfson, M. M. (1975). The application of phase relationships to complex structures. VII. Magic integers. Acta Cryst. A31, 53–56.Google Scholar

Wilkins, S. W., Varghese, J. N. & Lehmann, M. S. (1983). Statistical geometry. I. A self-consistent approach to the crystallographic inversion problem based on information theory. Acta Cryst. A39, 47–60.Google Scholar

Woolfson, M. M. (1977). On the application of phase relationships to complex structures. X. MAGLIN – a successor to MULTAN. Acta Cryst. A33, 219–225.Google Scholar

Yao, J.-X. (1981). On the application of phase relationships to complex structures. XVIII. RANTAN – random MULTAN. Acta Cryst. A37, 642–664.Google Scholar

International Tables for Crystallography (2006). Vol. B. ch. 2.2, pp. 228-230