International
Tables for Crystallography Volume B Reciprocal space Edited by U. Shmueli © International Union of Crystallography 2006 |
International Tables for Crystallography (2006). Vol. B. ch. 2.2, pp. 210-234
https://doi.org/10.1107/97809553602060000555 Chapter 2.2. Direct methods
aDipartimento Geomineralogico, Campus Universitario, I-70125 Bari, Italy Direct methods are essentially reciprocal-space techniques, developed historically to solve small-molecule crystal structures. Their success in this area (in practice, they solve the phase problem) is based on numerous theoretical achievements which concern origin specification (Section 2.2.3 Keywords: direct methods; ab initio phase determination; structure seminvariants; origin specification; phase relationships; phase determination; normalized structure factors; quasi-normalized structure factors; triplet relationships; quartet phase relationships; quintet phase relationships; determinantal formulae; Sayre's equation; tangent formula; magic-integer methods; macromolecular crystallography; isomorphous replacement; multiple isomorphous replacement with anomalous scattering; multiwavelength anomalous diffraction; single isomorphous replacement with anomalous scattering; MIRAS; MAD; SIRAS. |
Direct methods are today the most widely used tool for solving small crystal structures. They work well both for equal-atom molecules and when a few heavy atoms exist in the structure. In recent years the theoretical background of direct methods has been improved to take into account a large variety of prior information (the form of the molecule, its orientation, a partial structure, the presence of pseudosymmetry or of a superstructure, the availability of isomorphous data or of data affected by anomalous-dispersion effects, …). Owing to this progress and to the increasing availability of powerful computers, a number of effective, highly automated packages for the practical solution of the phase problem are today available to the scientific community.
The ab initio crystal structure solution of macromolecules seems not to exceed the potential of direct methods. Many efforts will certainly be devoted to this task in the near future: a report of the first achievements is given in Section 2.2.10.
This chapter describes both the traditional direct methods tools and the most recent and revolutionary techniques suitable for macromolecules.
The theoretical background and tables useful for origin specification are given in Section 2.2.3; in Section 2.2.4
the procedures for normalizing structure factors are summarized. Phase-determining formulae (inequalities, probabilistic formulae for triplet, quartet and quintet invariants, and for one- and two-phase s.s.'s, determinantal formulae) are given in Section 2.2.5.
In Section 2.2.6
the connection between direct methods and related techniques in real space is discussed. Practical procedures for solving crystal structures are described in Sections 2.2.7
and 2.2.8
, and references to the most extensively used packages are given in Section 2.2.9.
The techniques suitable for the ab initio crystal structure solution of macromolecules are described in Section 2.2.10.2
. The integration of direct methods with isomorphous-replacement and anomalous-dispersion techniques is briefly described in Sections 2.2.10.3
and 2.2.10.4
.
The reader will find full coverage of the most important aspects of direct methods in the recent books by Giacovazzo (1998) and Woolfson & Fan (1995
).
|
The normalized structure factors E (see also Chapter 2.1
) are calculated according to (Hauptman & Karle, 1953
)
where
is the squared observed structure-factor magnitude on the absolute scale and
is the expected value of
.
depends on the available a priori information. Often, but not always, this may be considered as a combination of several typical situations. We mention:
|
When probability theory is not used, the quasi-normalized structure factors and the unitary structure factors
are often used.
and
are defined according to
Since
is the largest possible value for
represents the fraction of
with respect to its largest possible value. Therefore
If atoms are equal, then
.
N.s.f.'s cannot be calculated by applying (2.2.4.1) to observed s.f.'s because: (a) the observed magnitudes
(already corrected for Lp factor, absorption, …) are on a relative scale; (b)
cannot be calculated without having estimated the vibrational motion of the atoms.
This is usually obtained by the well known Wilson plot (Wilson, 1942), according to which observed data are divided into ranges of
and averages of intensity
are taken in each shell. Reflection multiplicities and other effects of space-group symmetry on intensities must be taken into account when such averages are calculated. The shells are symmetrically overlapped in order to reduce statistical fluctuations and are restricted so that the number of reflections in each shell is reasonably large. For each shell
should be obtained, where K is the scale factor needed to place X-ray intensities on the absolute scale, B is the overall thermal parameter and
is the expected value of
in which it is assumed that all the atoms are at rest.
depends upon the structural information that is available (see Section 2.2.4.1
for some examples).
Equation (2.2.4.3) may be rewritten as
which plotted at various
should be a straight line of which the slope (2B) and intercept (ln K) on the logarithmic axis can be obtained by applying a linear least-squares procedure.
Very often molecular geometries produce perceptible departures from linearity in the logarithmic Wilson plot. However, the more extensive the available a priori information on the structure is, the closer, on the average, are the Wilson-plot curves to their least-squares straight lines.
Accurate estimates of B and K require good strategies (Rogers & Wilson, 1953) for:
|
Once K and B have been estimated, values can be obtained from experimental data by
where
is the expected value of
for the reflection h on the basis of the available a priori information.
Under some fairly general assumptions (see Chapter 2.1
) probability distribution functions for the variable
for cs. and ncs. structures are (see Fig. 2.2.4.1
)
and
respectively. Corresponding cumulative functions are (see Fig. 2.2.4.2
)
Some moments of the distributions (2.2.4.4) and (2.2.4.5)
are listed in Table 2.2.4.1
. In the absence of other indications for a given crystal structure, a cs. or an ncs. space group will be preferred according to whether the statistical tests yield values closer to column 2 or to column 3 of Table 2.2.4.1
.
|
For further details about the distribution of intensities see Chapter 2.1
.
From the earliest periods of X-ray structure analysis several authors (Ott, 1927; Banerjee, 1933
; Avrami, 1938
) have tried to determine atomic positions directly from diffraction intensities. Significant developments are the derivation of inequalities and the introduction of probabilistic techniques via the use of joint probability distribution methods (Hauptman & Karle, 1953
).
An extensive system of inequalities exists for the coefficients of a Fourier series which represents a positive function. This can restrict the allowed values for the phases of the s.f.'s in terms of measured structure-factor magnitudes. Harker & Kasper (1948) derived two types of inequalities:
Type 1. A modulus is bound by a combination of structure factors: where m is the order of the point group and
.
Applied to low-order space groups, (2.2.5.1) gives
The meaning of each inequality is easily understandable: in
, for example,
must be positive if
is large enough.
Type 2. The modulus of the sum or of the difference of two structure factors is bound by a combination of structure factors: where
stands for `real part of'. Equation (2.2.5.2)
applied to P1 gives
A variant of (2.2.5.2) valid for cs. space groups is
After Harker & Kasper's contributions, several other inequalities were discovered (Gillis, 1948
; Goedkoop, 1950
; Okaya & Nitta, 1952
; de Wolff & Bouman, 1954
; Bouman, 1956
; Oda et al., 1961
). The most general are the Karle–Hauptman inequalities (Karle & Hauptman, 1950
):
The determinant can be of any order but the leading column (or row) must consist of U's with different indices, although, within the column, symmetry-related U's may occur. For
and
, equation (2.2.5.3)
reduces to
which, for cs. structures, gives the Harker & Kasper inequality
For
, equation (2.2.5.3)
becomes
from which
where
If the moduli
,
,
are large enough, (2.2.5.4)
is not satisfied for all values of
. In cs. structures the eventual check that one of the two values of
does not satisfy (2.2.5.4)
brings about the unambiguous identification of the sign of the product
.
It was observed (Gillis, 1948) that `there was a number of cases in which both signs satisfied the inequality, one of them by a comfortable margin and the other by only a relatively small margin. In almost all such cases it was the former sign which was the correct one. That suggests that the method may have some power in reserve in the sense that there are still fundamentally stronger inequalities to be discovered'. Today we identify this power in reserve in the use of probability theory.
For any space group (see Section 2.2.3) there are linear combinations of phases with cosines that are, in principle, fixed by the
magnitudes alone (s.i.'s) or by the
values and the trigonometric form of the structure factor (s.s.'s). This result greatly stimulated the calculation of conditional distribution functions
where
,
is an s.i. or an s.s. and
is a suitable set of diffraction magnitudes. The method was first proposed by Hauptman & Karle (1953
) and was developed further by several authors (Bertaut, 1955a
,b
, 1960
; Klug, 1958
; Naya et al., 1964
, 1965
; Giacovazzo, 1980a
). From a probabilistic point ofview the crystallographic problem is clear: the joint distribution
, from which the conditional distributions (2.2.5.5)
can be derived, involves a number of normalized structure factors each of which is a linear sum of random variables (the atomic contributions to the structure factors). So, for the probabilistic interpretation of the phase problem, the atomic positions and the reciprocal vectors may be considered as random variables. A further problem is that of identifying, for a given Φ, a suitable set of magnitudes
on which Φ primarily depends. The formulation of the nested neighbourhood principle first (Hauptman, 1975
) fixed the idea of defining a sequence of sets of reflections each contained in the succeeding one and having the property that any s.i. or s.s. may be estimated via the magnitudes constituting the various neighbourhoods. A subsequent more general theory, the representation method (Giacovazzo, 1977a
, 1980b
), arranges for any Φ the set of intensities in a sequence of subsets in order of their expected effectiveness (in the statistical sense) for the estimation of Φ.
In the following sections the main formulae estimating low-order invariants and seminvariants or relating phases to other phases and diffraction magnitudes are given.
The basic formula for the estimation of the triplet phase given the parameter
is Cochran's (1955
) formula
where
,
is the atomic number of the jth atom and
is the modified Bessel function of order n. In Fig. 2.2.5.1
the distribution
is shown for different values of G.
The conditional probability distribution for , given a set of
and
, is given (Karle & Hauptman, 1956
; Karle & Karle, 1966
) by
where
is the most probable value for
. The variance of
may be obtained from (2.2.5.7)
and is given by
which is plotted in Fig. 2.2.5.2
.
Equation (2.2.5.9) is the so-called tangent formula. According to (2.2.5.10)
, the larger is α the more reliable is the relation
.
For an equal-atom structure .
The basic conditional formula for sign determination of in cs. crystals is Cochran & Woolfson's (1955
) formula
where
is the probability that
is positive and k ranges over the set of known values
. The larger the absolute value of the argument of tanh, the more reliable is the phase indication.
An auxiliary formula exploiting all the 's in reciprocal space in order to estimate a single Φ is the
formula (Hauptman & Karle, 1958
; Karle & Hauptman, 1958
) given by
where C is a constant which differs for cs. and ncs. crystals,
is the average value of
and p is normally chosen to be some small number. Several modifications of (2.2.5.12)
have been proposed (Hauptman, 1964
, 1970
; Karle, 1970a
; Giacovazzo, 1977b
).
A recent formula (Cascarano, Giacovazzo, Camalli et al., 1984) exploits information contained within the second representation of Φ, that is to say, within the collection of special quintets (see Section 2.2.5.6
):
where k is a free vector. The formula retains the same algebraic form as (2.2.5.6)
, but
where
,
is assumed to be zero if it is experimentally negative. The prime to the summation warns the reader that precautions have to be taken in order to avoid duplications in the contributions.
G may be positive or negative. In particular, if the triplet is estimated negative.
The accuracy with which the value of Φ is estimated strongly depends on . Thus, in practice, only a subset of reciprocal space (the reflections k with large values of ɛ) may be used for estimating Φ.
(2.2.5.13) proved to be quite useful in practice. Positive triplet cosines are ranked in order of reliability by (2.2.5.13)
markedly better than by Cochran's parameters. Negative estimated triplet cosines may be excluded from the phasing process and may be used as a figure of merit for finding the correct solution in a multisolution procedure.
A strength of direct methods is that no knowledge of structure is required for their application. However, when some a priori information is available, it should certainly be a weakness of the methods not to make use of this knowledge. The conditional distribution of Φ given and the first three of the five kinds of a priori information described in Section 2.2.4.1
is (Main, 1976
; Heinermann, 1977a
)
where
stand for h,
,
, and
for
. The quantities
have been calculated in Section 2.2.4.1
according to different categories:
is a suitable average of the product of three scattering factors for the ith atomic group, p is the number of atomic groups in the cell including those related by symmetry elements. We have the following categories.
|
In early papers (Hauptman & Karle, 1953; Simerska, 1956
) the phase
was always expected to be zero. Schenk (1973a
,b
) [see also Hauptman (1974
)] suggested that Φ primarily depends on the seven magnitudes:
, called basis magnitudes, and
, called cross magnitudes.
The conditional probability of Φ in P1 given seven magnitudes according to Hauptman (1975
) is
where L is a suitable normalizing constant which can be derived numerically,
For equal atoms
. Denoting
gives
Fig. 2.2.5.3
shows the distribution (2.2.5.18)
for three typical cases. It is clear from the figure that the cosine estimated near π or in the middle range will be in poorer agreement with the true values than the cosine near 0 because of the relatively larger values of the variance. In principle, however, the formula is able to estimate negative or enantiomorph-sensitive quartet cosines from the seven magnitudes.
![]() |
Distributions (2.2.5.18) |
In the cs. case (2.2.5.18) is replaced (Hauptman & Green, 1976
) by
where
is the probability that the sign of
is positive or negative, and
The normalized probability may be derived by
. More simple probabilistic formulae were derived independently by Giacovazzo (1975
, 1976
):
where
and
. Q is never allowed to be negative.
According to (2.2.5.20)
is expected to be positive or negative according to whether
is positive or negative: the larger is C, the more reliable is the phase indication. For
, (2.2.5.18)
and (2.2.5.20)
are practically equivalent in all cases. If N is small, (2.2.5.20)
is in good agreement with (2.2.5.18)
for quartets strongly defined as positive or negative, but in poor agreement for enantiomorph-sensitive quartets (see Fig. 2.2.5.3
).
In cs. cases the sign probability for is
where G is defined by (2.2.5.21)
.
All three cross magnitudes are not always in the set of measured reflections. From marginal distributions the following formulae arise (Giacovazzo, 1977c; Heinermann, 1977b
):
Equations (2.2.5.20) and (2.2.5.23)
are easily modifiable when some cross magnitudes are not in the measurements. If
is not measured then (2.2.5.20)
or (2.2.5.23)
are still valid provided that in G it is assumed that
. For example, if
and
are not in the data then (2.2.5.21)
and (2.2.5.22)
become
In space groups with symmetry higher than
more symmetry-equivalent quartets can exist of the type
where
are rotation matrices of the space group. The set
is called the first representation of Φ. In this case Φ primarily depends on more than seven magnitudes. For example, let us consider in Pmmm the quartet
Quartets symmetry equivalent to Φ and respective cross terms are given in Table 2.2.5.1
.
|
Experimental tests on the application of the representation concept to quartets have recently been made (Busetta et al., 1980). It was shown that quartets with more than three cross magnitudes are more accurately estimated than other quartets. Also, quartets with a cross reflection which is systematically absent were shown to be of significant importance in direct methods. In this context it is noted that systematically absent reflections are not usually included in the set of diffraction data. This custom, not exceptionable when only triplet relations are used, can give rise to a loss of information when quartets are used. In fact the usual programs of direct methods discard quartets as soon as one of the cross reflections is not measured, so that systematic absences are dealt with in the same manner as those reflections which are outside the sphere of measurements.
A quintet phase may be considered as the sum of three suitable triplets or the sum of a triplet and a quartet, i.e.
or
It depends primarily on 15 magnitudes: the five basis magnitudes
and the ten cross magnitudes
In the following we will denote
Conditional distributions of Φ in P1 and
given the 15 magnitudes have been derived by several authors and allow in favourable circumstances in ncs. space groups the quintets having Φ near 0 or near π or near
to be identified. Among others, we remember:
|
For cs. cases (2.2.5.24) reduces to
Positive or negative quintets may be identified according to whether G is larger or smaller than zero.
If is not measured then (2.2.5.24)
and (2.2.5.25)
are still valid provided that in (2.2.5.25)
.
If the symmetry is higher than in then more symmetry-equivalent quintets can exist of the type
where
are rotation matrices of the space groups. The set
is called the first representation of Φ. In this case Φ primarily depends on more than 15 magnitudes which all have to be taken into account for a careful estimation of Φ (Giacovazzo, 1980a
).
A wide use of quintet invariants in direct methods procedures is prevented for two reasons: (a) the large correlation of positive quintet cosines with positive triplets; (b) the large computing time necessary for their estimation [quintets are phase relationships of order , so a large number of quintets have to be estimated in order to pick up a sufficient percentage of reliable ones].
In a crystal structure with N identical atoms the joint probability distribution of n normalized s.f.'s under the following conditions:
|
Advantages, limitations and applications of determinantal formulae can be found in the literature (Heinermann et al., 1979; de Rango et al., 1975
, 1985
). Taylor et al. (1978
) combined K–H determinants with a magic-integer approach. The computing time, however, was larger than that required by standard computing techniques. The use of K–H matrices has been made faster and more effective by de Gelder et al. (1990
) (see also de Gelder, 1992
). They developed a phasing procedure (CRUNCH) which uses random phases as starting points for the maximization of the K–H determinants.
According to the representations method (Giacovazzo, 1977a, 1980a
,b
):
|
The more general expressions for the s.s.'s of first rank are
In other words:
|
The set of special quartets (2.2.5.35a) and (2.2.5.35b) constitutes the first representations of Φ.
Structure seminvariants of the second rank can be characterized as follows: suppose that, for a given seminvariant Φ, it is not possible to find a vectorial index h and a rotation matrix such that
is a structure invariant. Then Φ is a structure seminvariant of the second rank and a set of structure invariants ψ can certainly be formed, of type
by means of suitable indices h and l and rotation matrices
and
. As an example, for symmetry class 222,
or
or
are s.s.'s of the first rank while
is an s.s. of the second rank.
The procedure may easily be generalized to s.s.'s of any order of the first and of the second rank. So far only the role of one-phase and two-phase s.s.'s of the first rank in direct procedures is well documented (see references quoted in Sections 2.2.5.9 and 2.2.5.10
).
Let be our one-phase s.s. of the first rank, where
In general, more than one rotation matrix
and more than one vector h are compatible with (2.2.5.36)
. The set of special triplets
is the first representation of
. In cs. space groups the probability that
, given
and the set
, may be estimated (Hauptman & Karle, 1953
; Naya et al., 1964
; Cochran & Woolfson, 1955
) by
where
In (2.2.5.37)
, the summation over n goes within the set of matrices
for which (2.2.5.35a,b) is compatible, and h varies within the set of vectors which satisfy (2.2.5.36)
for each
. Equation (2.2.5.36)
is actually a generalized way of writing the so-called
relationships (Hauptman & Karle, 1953
).
If is a phase restricted by symmetry to
and
in an ncs. space group then (Giacovazzo, 1978
)
If
is a general phase then
is distributed according to
where
with a reliability measured by
The second representation of
is the set of special quintets
provided that h and
vary over the vectors and matrices for which (2.2.5.36)
is compatible, k over the asymmetric region of the reciprocal space, and
over the rotation matrices in the space group. Formulae estimating
via the second representation in all the space groups [all the base and cross magnitudes of the quintets (2.2.5.40)
now constitute the a priori information] have recently been secured (Giacovazzo, 1978
; Cascarano & Giacovazzo, 1983
; Cascarano, Giacovazzo, Calabrese et al., 1984
). Such formulae contain, besides the contribution of order
provided by the first representation, a supplementary (not negligible) contribution of order
arising from quintets.
Denoting formulae (2.2.5.37)
, (2.2.5.38)
, (2.2.5.39)
still hold provided that
is replaced by
where
m is the number of symmetry operators and
is the Hermite polynomial of order four.
is assumed to be zero if it is computed negative. The prime to the summation warns the reader that precautions have to be taken in order to avoid duplication in the contributions.
Two-phase s.s.'s of the first rank were first evaluated in some cs. space groups by the method of coincidence by Grant et al. (1957); the idea was extended to ncs. space groups by Debaerdemaeker & Woolfson (1972
), and in a more general way by Giacovazzo (1977e
,f
).
The technique was based on the combination of the two triplets which, subtracted from one another, give
If all four
's are sufficiently large, an estimate of the two-phase seminvariant
is available.
Probability distributions valid in according to the neighbourhood principle have been given by Hauptman & Green (1978
). Finally, the theory of representations was combined by Giacovazzo (1979a
) with the joint probability distribution method in order to estimate two-phase s.s.'s in all the space groups.
According to representation theory, the problem is that of evaluating via the special quartets (2.2.5.35a
) and (2.2.5.35b
). Thus, contributions of order
will appear in the probabilistic formulae, which will be functions of the basis and of the cross magnitudes of the quartets (2.2.5.35)
. Since more pairs of matrices
and
can be compatible with (2.2.5.34)
, and for each pair
more pairs of vectors
and
may satisfy (2.2.5.34)
, several quartets can in general be exploited for estimating Φ. The simplest case occurs in
where the two quartets (2.2.5.35)
suggest the calculation of the six-variate distribution function
which leads to the probability formula
where
is the probability that the product
is positive, and
It may be seen that in favourable cases
.
For the sake of brevity, the probabilistic formulae for the general case are not given and the reader is referred to the original papers.
The statistical treatment suggested by Wilson for scaling observed intensities corresponds, in direct space, to the origin peak of the Patterson function, so it is not surprising that a general correspondence exists between probabilistic formulation in reciprocal space and algebraic properties in direct space.
For a structure containing atoms which are fully resolved from one another, the operation of raising to the nth power retains the condition of resolved atoms but changes the shape of each atom. Let
where
is an atomic function and
is the coordinate of the `centre' of the atom. Then the Fourier transform of the electron density can be written as
If the atoms do not overlap
and its Fourier transform gives
is the scattering factor for the jth peak of
:
We now introduce the condition that all atoms are equal, so that and
for any j. From (2.2.6.1)
and (2.2.6.2)
we may write
where
is a function which corrects for the difference of shape of the atoms with electron distributions
and
. Since
the Fourier transform of both sides gives
from which the following relation arises:
For
, equation (2.2.6.4)
reduces to Sayre's (1952
) equation [but see also Hughes (1953
)]
If the structure contains resolved isotropic atoms of two types, P and Q, it is impossible to find a factor
such that the relation
holds, since this would imply values of
such that
and
simultaneously. However, the following relationship can be stated (Woolfson, 1958
):
where
and
are adjustable parameters of
. Equation (2.2.6.6)
can easily be generalized to the case of structures containing resolved atoms of more than two types (von Eller, 1973
).
Besides the algebraic properties of the electron density, Patterson methods also can be developed so that they provide phase indications. For example, it is possible to find the reciprocal counterpart of the function For
the function (2.2.6.7)
coincides with the usual Patterson function
; for
, (2.2.6.7)
reduces to the double Patterson function
introduced by Sayre (1953
). Expansion of
as a Fourier series yields
Vice versa, the value of a triplet invariant may be considered as the Fourier transform of the double Patterson.
Among the main results relating direct- and reciprocal-space properties it may be remembered:
|
A traditional procedure for phase assignment may be schematically presented as follows:
|
In very complex structures a large initial set of known phases seems to be a basic requirement for a structure to be determined. This aim can be achieved, for example, by introducing a large number of permutable phases into the initial set. However, the introduction of every new symbol implies a fourfold increase in computing time, which, even in fast computers, quickly leads to computing-time limitations. On the other hand, a relatively large starting set is not in itself enough to ensure a successful structure determination. This is the case, for example, when the triplet invariants used in the initial steps differ significantly from zero. New strategies have therefore been devised to solve more complex structures.
|
Some references for direct-methods packages are given below. Other useful packages using symbolic addition or multisolution procedures do exist but are not well documented.
CRUNCH
: Gelder, R. de, de Graaff, R. A. G. & Schenk, H. (1993). Automatic determination of crystal structures using Karle–Hauptman matrices. Acta Cryst. A49, 287–293.
DIRDIF
: Beurskens, P. T., Beurskens G., de Gelder, R., Garcia-Granda, S., Gould, R. O., Israel, R. & Smits, J. M. M. (1999). The DIRDIF-99 program system. Crystallography Laboratory, University of Nijmegen, The Netherlands.
MITHRIL
: Gilmore, C. J. (1984). MITHRIL. An integrated direct-methods computer program. J. Appl. Cryst. 17, 42–46.
MULTAN
88: Main, P., Fiske, S. J., Germain, G., Hull, S. E., Declercq, J.-P., Lessinger, L. & Woolfson, M. M. (1999). Crystallographic software: teXsan for Windows. http://www.msc.com/brochures/teXsan/wintex.html.
PATSEE
: Egert, E. & Sheldrick, G. M. (1985). Search for a fragment of known geometry by integrated Patterson and direct methods. Acta Cryst. A41, 262–268.
SAPI
: Fan, H.-F. (1999). Crystallographic software: teXsan for Windows. http://www.msc.com/brochures/teXsan/wintex.html.
SnB
: Weeks, C. M. & Miller, R. (1999). The design and implementation of SnB version 2.0. J. Appl. Cryst. 32, 120–124.
SHELX
97: Sheldrick, G. M. (2000a). The SHELX home page. http://shelx.uni-ac.gwdg.de/SHELX/
.
SHELXS
: Sheldrick, G. M. (2000b). SHELX. http://www.ucg.ie/cryst/shelx.htm
.
SIR
97: Altomare, A., Burla, M. C., Camalli, M., Cascarano,G. L., Giacovazzo, C., Guagliardi, A., Moliterni, A. G. G., Polidori, G. & Spagna, R. (1999). SIR97: a new tool for crystal structure determination and refinement. J. Appl. Cryst. 32, 115–119.
XTAL
3.6.1: Hall, S. R., du Boulay, D. J. & Olthof-Hazekamp, R. (1999). Xtal3.6 crystallographic software. http://www.crystal.uwa.edu.au/Crystal/xtal
.
Protein structures cannot be solved ab initio by traditional direct methods (i.e., by application of the tangent formula alone). Accordingly, the first applications were focused on two tasks:
The application of standard tangent techniques to (a) and (b)
has not been found to be very satisfactory (Coulter & Dewar, 1971
; Hendrickson et al., 1973
; Weinzierl et al., 1969
). Tangent methods, in fact, require atomicity and non-negativity of the electron density. Both these properties are not satisfied if data do not extend to atomic resolution
. Because of series termination and other errors the electron-density map at
presents large negative regions which will appear as false peaks in the squared structure. However, tangent methods use only a part of the information given by the Sayre equation (2.2.6.5)
. In fact, (2.2.6.5)
express two equations relating the radial and angular parts of the two sides, so obtaining a large degree of overdetermination of the phases. To achieve this Sayre (1972
) [see also Sayre & Toupin (1975
)] suggested minimizing (2.2.10.1)
by least squares as a function of the phases:
Even if tests on rubredoxin (extensions of phases from 2.5 to 1.5 Å resolution) and insulin (Cutfield et al., 1975
) (from 1.9 to 1.5 Å resolution) were successful, the limitations of the method are its high cost and, especially, the higher efficiency of the least-squares method. Equivalent considerations hold for the application of determinantal methods to proteins [see Podjarny et al. (1981
); de Rango et al. (1985
) and literature cited therein].
A question now arises: why is the tangent formula unable to solve protein structures? Fan et al. (1991) considered the question from a first-principle approach and concluded that:
Sheldrick (1990) suggested that direct methods are not expected to succeed if fewer than half of the reflections in the range 1.1–1.2 Å are observed with
(a condition seldom satisfied by protein data).
The most complete analysis of the problem has been made by Giacovazzo, Guagliardi et al. (1994). They observed that the expected value of α (see Section 2.2.7
) suggested by the tangent formula for proteins is comparable with the variance of the α parameter. In other words, for proteins the signal determining the phase is comparable with the noise, and therefore the phase indication is expected to be unreliable.
Section 2.2.10.1 suggests that the mere use of the tangent formula or the Sayre equation cannot solve ab initio protein structures of usual size. However, even in an ab initio situation, there is a source of supplementary information which may be used. Good examples are the `peaklist optimization' procedure (Sheldrick & Gould, 1995
) and the SIR97 procedure (Altomare et al., 1999
) for refining and completing the trial structure offered by the first E map.
In both cases there are reasons to suspect that the correct structure is sometimes extracted from a totally incorrect direct-methods solution. These results suggest that a direct-space procedure can provide some form of structural information complementary to that used in reciprocal space by the tangent or similar formulae. The combination of real- and reciprocal-space techniques could therefore enlarge the size of crystal structures solvable by direct methods. The first program to explicitly propose the combined use of direct and reciprocal space was Shake and Bake (SnB), which inspired a second package, half-bake (HB). A third program, SIR99, uses a different algorithm.
The SnB method (DeTitta et al., 1994; Weeks et al., 1994
; Hauptman, 1995
) is the heir of the cosine least-squares method described in Section 2.2.8
, point (4
). The function
where
is the triplet phase,
and
.
is expected to have a global minimum, provided the number of phases involved is sufficiently large, when all the phases are equal to their true values for some choice of origin and enantiomorph. Thus the phasing problem reduces to that of finding the global minimum of
(the minimum principle).
SnB comprises a shake step (phase refinement) and a bake step (electron-density modification), the second step aiming to impose phase constraints implicit in real space. Accordingly, the program requires two Fourier transforms per cycle, and numerous cycles. Thus it may be very time consuming and it is not competitive with other direct methods for the solution of the crystal structures of small molecules. However, it introduced into the field the tremendous usefulness of intensive computations for the direct solution of complex crystal structures.
Owing to Sheldrick (1997), HB does most of its work in direct space. Random atomic positions are generated, to which a modified peaklist optimization process is applied. A number of peaks are eliminated subject to the condition that
remains as large as possible (only reflections with
are involved, where
). The phases of a suitable subset of reflections are then used as input for a tangent expansion. Then an E map is calculated from which peaks are selected: these are submitted to the elimination procedure.
Typically 5–20 cycles of this internal loop are performed. Then a correlation coefficient (CC) between and
is calculated for all the data. If the CC is good (i.e. larger than a given threshold), then a new loop is performed: a new E map is obtained, from which a list of peaks is selected for submission to the elimination procedure. The criterion now is the value of the CC, which is calculated for all the reflections. Typically two to five cycles of this external loop are performed.
The program works indefinitely, restarting from random atoms until interrupted. It may work either by applying the true space-group symmetry or after having expanded the data to P1.
The SIR99 procedure (Burla et al., 1999) may be divided into two distinct parts: the tangent section (i.e., a double tangent process using triplet and quartet invariants) is followed by a real-space refinement procedure. As in SIR97, the reciprocal-space part is followed by the real-space refinement, but this time this last part is much more complex. It involves three different techniques: EDM (an electron-density modification process), the HAFR part (in which all the peaks are associated with the heaviest atomic species) and the DLSQ procedure (a least-squares Fourier refinement process). The atomicity is gradually introduced into the procedure. The entire process requires, for each trial, several cycles of EDM and HAFR: the real-space part is able to lead to the correct solution even when the tangent formula does not provide favourable phase values.
The modulus of the isomorphous difference may be assumed at a first approximation as an estimate of the heavy-atom s.f.
. Normalization of
's and application of the tangent formula may reveal the heavy-atom structure (Wilson, 1978
).
The theoretical basis for integrating the techniques of direct methods and isomorphous replacement was introduced by Hauptman (1982a). According to his notation let us denote by
and
atomic scattering factors for the atom labelled j in a pair of isomorphous structures, and let
and
denote corresponding normalized structure factors. Then
where
The conditional probability of the two-phase structure invariant
given
and
is (Hauptman, 1982a
)
where
Three-phase structure invariants were evaluated by considering that eight invariants exist for a given triple of indices h, k, l
:
So, for the estimation of any
, the joint probability distribution
has to be studied, from which eight conditional probability densities can be obtained:
for
.
The analytical expressions of are too intricate and are not given here (the reader is referred to the original paper). We only say that
may be positive or negative, so that reliable triplet phase estimates near 0 or near π are possible: the larger
, the more reliable the phase estimate.
A useful interpretation of the formulae in terms of experimental parameters was suggested by Fortier et al. (1984): according to them, distributions do not depend, as in the case of the traditional three-phase invariants, on the total number of atoms per unit cell but rather on the scattering difference between the native protein and the derivative (that is, on the scattering of the heavy atoms in the derivative).
Hauptman's formulae were generalized by Giacovazzo et al. (1988): the new expressions were able to take into account the resolution effects on distribution parameters. The formulae are completely general and include as special cases native protein and heavy-atom isomorphous derivatives as well as X-ray and neutron diffraction data. Their complicated algebraic forms are easily reduced to a simple expression in the case of a native protein heavy-atom derivative: in particular, the reliability parameter for
is
where indices P and H warn that parameters have to be calculated over protein atoms and over heavy atoms, respectively, and
Δ is a pseudo-normalized difference (with respect to the heavy-atom structure) between moduli of structure factors.
Equation (2.2.10.2) may be compared with Karle's (1983
) qualitative rule: if the sign of
is plus then the value of
is estimated to be zero; if its sign is minus then the expected value of
is close to π. In practice Karle's rule agrees with (2.2.10.2)
only if the Cochran-type term in (2.2.10.2)
may be neglected. Furthermore, (2.2.10.2)
shows that large reliability values do not depend on the triple product of structure-factor differences, but on the triple product of pseudo-normalized differences. A series of papers (Giacovazzo, Siliqi & Ralph, 1994
; Giacovazzo, Siliqi & Spagna, 1994
; Giacovazzo, Siliqi & Platas, 1995
; Giacovazzo, Siliqi & Zanotti, 1995
; Giacovazzo et al., 1996
) shows how equation (2.2.10.2
) may be implemented in a direct procedure which proved to be able to estimate the protein phases correctly without any preliminary information on the heavy-atom substructure.
Combination of direct methods with the two-derivative case is also possible (Fortier et al., 1984) and leads to more accurate estimates of triplet invariants provided experimental data are of sufficient accuracy.
If the frequency of the radiation is close to an absorption edge of an atom, then that atom will scatter the X-rays anomalously (see Chapter 2.4
) according to
. This results in the breakdown of Friedel's law. It was soon realized that the Bijvoet difference could also be used in the determination of phases (Peerdeman & Bijvoet, 1956
; Ramachandran & Raman, 1956
; Okaya & Pepinsky, 1956
). Since then, a great deal of work has been done both from algebraic (see Chapter 2.4
) and from probabilistic points of view. In this section we are only interested in the second.
We will mention the following different cases:
|
Probability distributions of diffraction intensities and of selected functions of diffraction intensities for dispersive structures have been given by various authors [Parthasarathy & Srinivasan (1964), see also Srinivasan & Parthasarathy (1976
) and relevant literature cited therein]. We describe here some probabilistic formulae for estimating invariants of low order.
|
Let us now describe some practical aspects of the integration of direct methods with OAS techniques.
Anomalous difference structure factors can be used for locating the positions of the anomalous scatterers (Mukherjee et al., 1989
). Tests prove that accuracy in the difference magnitudes is critical for the success of the phasing process.
Suppose now that the positions of the heavy atoms have been found. How do we estimate the phase values for the protein? The phase ambiguity strictly connected with OAS techniques can be overcome by different methods: we quote the Qs method by Hao & Woolfson (1989), the Wilson distribution method and the MPS method by Ralph & Woolfson (1991
), and the Bijvoet–Ramachandran–Raman method by Peerdeman & Bijvoet (1956
), Raman (1959
) and Moncrief & Lipscomb (1966
). More recently, a probabilistic method by Fan & Gu (1985
) gained additional insight into the problem.
Isomorphous replacement and anomalous scattering are discussed in Chapter 2.4
and in IT F
(2001
). We observe here only that the SIRAS case can lead algebraically to unambiguous phase determination provided the experimental data are sufficiently good. Thus, any probabilistic treatment must take into consideration errors in the measurements.
In the MIRAS and MAD cases the system is overconditioned: again any probabilistic treatment must consider errors in the measurements, but now overconditioning allows the reduction of the perverse effects of the experimental errors and (in MIRAS) of the lack of isomorphism.
A particular application of extreme relevance concerns the location of anomalous scatterers when selenomethionine-substituted proteins and MAD data are available (Hendrickson & Ogata, 1997; Smith, 1998
). In this case, many selenium sites should be identified and usual Patterson-interpretation methods can be expected to fail. The successes of SnB and HB prove the essential role of direct methods in this important area.
References
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)
![First citation First citation](/graphics/b_uparr.gif)