Model reconstruction

Lamzin, V. S.; Perrakis, A.; Wilson, K. S.

doi:10.1107/97809553602060000724

International
Tables for
Crystallography
Volume F
Crystallography of biological macromolecules
Edited by M. G. Rossmann and E. Arnold

pdf | chapter contents | chapter index | related articles

International Tables for Crystallography (2006). Vol. F. ch. 25.2, pp. 720-721 | 1 | 2 |

Section 25.2.5.1.2. Model reconstruction

V. S. Lamzin,ⁿ ^* A. Perrakis^o and K. S. Wilson^p

25.2.5.1.2. Model reconstruction

| top | pdf |

The main problem in automatically reconstructing a protein model from electron-density maps is in achieving an initial tracing of the polypeptide chain, even if the result is only partially complete. Subsequent building of side chains and filling of possible gaps is a relatively straightforward task. The complexity of the autotracing can be nicely illustrated as the well known travelling-salesman problem. Suppose one is faced with 100 trial peptide units possessing two incoming and two outgoing connections on average, which is close to what happens in a typical ARP refinement of a 10 kDa protein. Assuming that one of the chain ends is known and that it is possible to connect all the points regardless of the chosen route, then one is faced with the problem of choosing the best chain out of 2⁹⁸. In practice, the situation is even more complex, as not all trial peptides are necessarily correctly identified in the first iteration and some may be missing – analogous to the correctness or incorrectness of the atomic positions described above.

If the connections can be assigned a probability of the peptide being correct, then only the path that visits each node exactly once and maximizes the total probability remains to be identified. Automatic density-map interpretation is based on the location of the atoms in the current model and consists of several steps. Firstly, each atom of the free-atom model is assigned a probability of being correct. Secondly, these weighted atoms are used for identification of patterns typical for a protein. The method utilizes the fact that all residues that comprise a protein, with the exception of cis peptides, have chemically identical main-chain fragments which are close to planar: the structurally identical Cα—C—O—N—Cα trans peptide units.

The problem of searching for possible peptide units and their connections thus becomes straightforward. The most crucial factor is that proteins are composed of linear non-branching polypeptide chains, allowing sets of connected peptides to be obtained from an initial list of all possible tracings. Choosing the direction of a chain path is carried out on the basis of the electron density and observed backbone conformations. The set of peptide units and the list of how they are interconnected do not, however, allow unambiguous tracing of a full-length chain in most cases.

Taken together, the probabilistic identification of the peptide units, the naturally high conformational flexibility of the connections of the peptide units and the limited quality of the X-ray data and/or phases introduce large enough errors to cause density breaks in the middle of the chains or result in density overlaps. Thus, the result of such a tracing is usually a set of several main-chain fragments. The less accurate the starting maps (i.e. initial phases) and the lower the resolution and quality of the X-ray data, the more breaks there will be in the tracing and the greater the number of peptide units which will be difficult to identify.

Residues are differentiated only as glycine, alanine, serine and valine, and complete side chains are not built at this stage. For every polypeptide fragment, a side-chain type can be assigned with a defined probability, using connectivity criteria from the free-atom models and the α-carbon positions of the main-chain fragments. Given these guesses for the side chains and provided the sequence is known, the next step employs docking of the polypeptide fragments into the sequence. Each possible docking position is assigned a score, which allows automated inspection of the side-chain densities, search for expected patterns and building of the most probable side-chain conformations.

References

International Tables for Crystallography (2006). Vol. F. ch. 25.2, pp. 720-721