Refinement and model building are two sides of modelling a structure

Lamzin, V. S.; Perrakis, A.; Wilson, K. S.

doi:10.1107/97809553602060000724

International
Tables for
Crystallography
Volume F
Crystallography of biological macromolecules
Edited by M. G. Rossmann and E. Arnold

pdf | chapter contents | chapter index | related articles

International Tables for Crystallography (2006). Vol. F. ch. 25.2, pp. 720-721 | 1 | 2 |

Section 25.2.5.1. Refinement and model building are two sides of modelling a structure

V. S. Lamzin,ⁿ ^* A. Perrakis^o and K. S. Wilson^p

25.2.5.1. Refinement and model building are two sides of modelling a structure

| top | pdf |

The conventional view of crystallographic refinement of macromolecules is the optimization of the parameters of a model to fit both the experimental data and a set of a priori stereochemical observations. The user provides the model and, although the values of its parameters are allowed to vary during the minimization cycles, the presence of the atoms is fixed, i.e. the addition or removal of parts of the model is not allowed. As a result, users are often faced with a situation where several atoms lie in one place, while the density maps suggest an entirely different location. Manual intervention, consisting of moving atoms to a more appropriate place using molecular graphics, density maps and geometrical assumptions can solve the problem and allow refinement to proceed further.

The Automated Refinement Procedure (ARP; Fig. 25.2.5.1) (Lamzin & Wilson, 1993, 1997; Perrakis et al., 1999) challenges this classical view by addition of real-space manipulation of the model, mimicking user intervention in silica. Adding and/or deleting atoms (model update) and complete re-evaluation of the model to create a new one that better describes the electron density (model reconstruction) can achieve this aim.

Figure 25.2.5.1| top | pdf |

A flow chart of the Automated Refinement Procedure.

25.2.5.1.1. Model update

| top | pdf |

The quickest way to change the position of an atom substantially is not to move it, but rather involves a two-step procedure to remove it from its current (probably wrong) site and to add a new atom at a new (hopefully right) position. Such updating of the model does not imply that all rejected atoms are immediately repositioned in a new site, so the number of atoms to be added does not have to be equal to the number rejected.

Atom rejection in ARP is primarily based on the interpolated $[2mF_{o} - \Delta F_{c}]$ or $[3F_{o} - 2F_{c}]$ electron density at its atomic centre and the agreement of the atomic density distribution with a target shape. Applied together, these criteria offer powerful means of identifying incorrectly placed atoms, but can suggest false positives. However, a correctly located atom that happens to be rejected should be selected again and put back in the model. Developments of further, perhaps more elegant, criteria may be expected in the future development of the technique.

Atom addition uses the difference $[mF_{o} - \Delta F_{c}]$ or $[F_{o} - F_{c}]$ Fourier synthesis. The selection is based on grid points rather than peaks, as the latter are often poorly defined and may overlap with neighbouring peaks or existing atoms, especially if the resolution and phases are poor. The map grid point with the highest electron density satisfying the defined distance constraints is selected as a new atom, grid points within a defined radius around this atom are rejected and the next highest grid point is selected. This is iterated until the desired number of new atoms is found and reciprocal-space minimization is used to optimize the new atomic parameters.

Real-space refinement based on density shape analysis around an atom can be used for the definition of the optimum atomic position. Atoms are moved to the centre of the peak using a target function that differs from that employed in reciprocal-space minimization. The function used is the sphericity of the site, which keeps an atom in the centre of the density cloud but has little influence on the R factor and phase quality. It is only applicable for well separated atoms and is mainly used for solvent atoms at high resolution.

Geometrical constraints are based on a priori chemical knowledge of the distances between covalently linked carbon, nitrogen and oxygen atoms (1.2 to 1.6 Å) and hydrogen-bonded atoms (2.2 to 3.3 Å). Such constraints are applied in rejection and addition of atoms.

25.2.5.1.2. Model reconstruction

| top | pdf |

The main problem in automatically reconstructing a protein model from electron-density maps is in achieving an initial tracing of the polypeptide chain, even if the result is only partially complete. Subsequent building of side chains and filling of possible gaps is a relatively straightforward task. The complexity of the autotracing can be nicely illustrated as the well known travelling-salesman problem. Suppose one is faced with 100 trial peptide units possessing two incoming and two outgoing connections on average, which is close to what happens in a typical ARP refinement of a 10 kDa protein. Assuming that one of the chain ends is known and that it is possible to connect all the points regardless of the chosen route, then one is faced with the problem of choosing the best chain out of 2⁹⁸. In practice, the situation is even more complex, as not all trial peptides are necessarily correctly identified in the first iteration and some may be missing – analogous to the correctness or incorrectness of the atomic positions described above.

If the connections can be assigned a probability of the peptide being correct, then only the path that visits each node exactly once and maximizes the total probability remains to be identified. Automatic density-map interpretation is based on the location of the atoms in the current model and consists of several steps. Firstly, each atom of the free-atom model is assigned a probability of being correct. Secondly, these weighted atoms are used for identification of patterns typical for a protein. The method utilizes the fact that all residues that comprise a protein, with the exception of cis peptides, have chemically identical main-chain fragments which are close to planar: the structurally identical Cα—C—O—N—Cα trans peptide units.

The problem of searching for possible peptide units and their connections thus becomes straightforward. The most crucial factor is that proteins are composed of linear non-branching polypeptide chains, allowing sets of connected peptides to be obtained from an initial list of all possible tracings. Choosing the direction of a chain path is carried out on the basis of the electron density and observed backbone conformations. The set of peptide units and the list of how they are interconnected do not, however, allow unambiguous tracing of a full-length chain in most cases.

Taken together, the probabilistic identification of the peptide units, the naturally high conformational flexibility of the connections of the peptide units and the limited quality of the X-ray data and/or phases introduce large enough errors to cause density breaks in the middle of the chains or result in density overlaps. Thus, the result of such a tracing is usually a set of several main-chain fragments. The less accurate the starting maps (i.e. initial phases) and the lower the resolution and quality of the X-ray data, the more breaks there will be in the tracing and the greater the number of peptide units which will be difficult to identify.

Residues are differentiated only as glycine, alanine, serine and valine, and complete side chains are not built at this stage. For every polypeptide fragment, a side-chain type can be assigned with a defined probability, using connectivity criteria from the free-atom models and the α-carbon positions of the main-chain fragments. Given these guesses for the side chains and provided the sequence is known, the next step employs docking of the polypeptide fragments into the sequence. Each possible docking position is assigned a score, which allows automated inspection of the side-chain densities, search for expected patterns and building of the most probable side-chain conformations.

25.2.5.1.3. Representation of a map by free-atom models

| top | pdf |

An electron-density map can be used to create a free-atom atomic model, with equal atoms placed in regions of high density (Perrakis et al., 1997). To build this model, only the molecular weight of the protein is required, without any sequence information. In brief, a map covering a crystallographic asymmetric unit on a fine grid of about 0.25 Å is constructed. The model is slowly expanded from a random seed by the stepwise addition of atoms in significant electron density and at bonding distances from existing atoms. All atoms in this model and in all subsequent steps are considered to be of the same type. As ARP proceeds, the geometrical criteria remain the same, but the density threshold is gradually reduced, allowing positioning of atoms in lower-density areas of the map. The procedure continues until the number of atoms is about three times that expected. This number is then reduced to about n + 20% atoms by removing atoms in weak density. This method of map parameterization has the advantage that it puts atoms at protein-like distances while covering the whole volume of the protein.

25.2.5.1.4. Hybrid models

| top | pdf |

A free-atom model can describe almost every feature of an electron-density map, but this interpretation rarely resembles a conventional conception of a protein. Nevertheless, information from parts of the improved map and the free-atom model can be automatically recognized as containing elements of protein structure by applying the algorithms briefly described for model reconstruction, and at least a partial atomic protein model can be built. Combination of this partial protein model with a free-atom set (a hybrid model) allows a considerably better description of the current map. The protein model provides additional information (in the form of stereochemical restraints), while prominent features in the electron density (unaccounted for by the current model) are described by free atoms.

25.2.5.1.5. Real-space manipulation coupled with reciprocal-space refinement

| top | pdf |

The procedure of real-space manipulation is coupled to least-squares or maximum-likelihood optimization of the model's parameters against the X-ray data. This is the scheme that we generally refer to as ARP refinement, though there are two distinct modes of ARP: In the unrestrained mode, all atoms in reciprocal-space refinement are treated as free atoms with unknown connectivity and are refined against the experimental data alone. This mode has a higher radius of convergence but needs high-resolution diffraction data to perform effectively. In the restrained mode, a model or a hybrid model is required, i.e. the atoms must belong to groups of known stereochemistry. This stereochemical information, in the form of restraints, can then be utilized during the reciprocal-space minimization, allowing it to proceed with less data, presuming that the connectivity of the input atoms is basically correct.

References

Lamzin, V. S. & Wilson, K. S. (1993). Automated refinement of protein models. Acta Cryst. D49, 129–147.Google Scholar

Lamzin, V. S. & Wilson, K. S. (1997). Automated refinement for protein crystallography. Methods Enzymol. 277, 269–305.Google Scholar

Perrakis, A., Morris, R. & Lamzin, V. S. (1999). Automated protein model building combined with iterative structure refinement. Nature Struct. Biol. 6, 458–463.Google Scholar

Perrakis, A., Sixma, T. K., Wilson, K. S. & Lamzin, V. S. (1997). wARP: improvement and extension of crystallographic phases by weighted averaging of multiple-refined dummy atomic models. Acta Cryst. D53, 448–455.Google Scholar

International Tables for Crystallography (2006). Vol. F. ch. 25.2, pp. 720-721