Design principles

Tronrud, D. E.; Ten Eyck, L. F.

doi:10.1107/97809553602060000724

International
Tables for
Crystallography
Volume F
Crystallography of biological macromolecules
Edited by M. G. Rossmann and E. Arnold

pdf | chapter contents | chapter index | related articles

International Tables for Crystallography (2006). Vol. F. ch. 25.2, pp. 717-718 | 1 | 2 |

Section 25.2.4.3. Design principles

D. E. Tronrud^m ^* and L. F. Ten Eyck^y

25.2.4.3. Design principles

| top | pdf |

TNT was designed with three fundamental principles in mind. Each principle has a number of consequences that shaped the ultimate form of the package.

25.2.4.3.1. Refinement should be simple to run

| top | pdf |

The user should not be burdened with the choice of input parameters that they may not be qualified to choose. They also should not be forced to construct an input file that is obscure and difficult to understand. It is hard now to remember what most computer programs were like in the 1970s. Usually, the input to the program was a block of numbers and flags where the meaning of each item was defined by its line and column numbers. This block not only contained information the programmer could never anticipate, like the cell constants, but defined how the computer's memory should be allocated and obscure parameters that could only be estimated after careful reading of research papers.

TNT was one of the first programs in crystallography to have its input introduced with keywords and to allow input statements to come in any order. As an example of the difference, consider the resolution limits. Usually, a crystallographic program would have a line in its input similar to $[\tt99.0,1.9,]$ One had to recognize this line amongst many as the line containing the resolution limits. (In many programs, a value of 99 was used to indicate that no lower-resolution limit was to be applied.) In TNT the same data would be entered as $[\tt\hbox{RESOLUTION } 1.9]$ The keyword identifies the data as the resolution limit(s). If the statement contained two numbers, they were considered the upper and lower limits of the diffraction data.

The preceding example also shows how default values can be implemented by a program much more safely with keyword-based input. In the previous scheme, if a value was ever to be changed by the user, its place had to be allocated in the input block. This often left numbers floating in the block which were almost never changed, and because they were so infrequently referred to, they were usually unrecognized by the user. It was quite possible for one of these numbers to be accidentally changed and the error unnoticed for quite some time. When the data are introduced with keywords, a data item is not mentioned if the default value is suitable.

25.2.4.3.2. Refinement should run quickly and use as little memory as possible

| top | pdf |

The most time-consuming calculations in refinement are the calculation of structure factors from atomic coordinates and the calculation of derivatives of the part of the residual dependent upon the diffraction data with respect to the atomic parameters. The quickest means of performing these calculations requires the use of space-group-optimized fast Fourier transforms (FFTs). The initial implementation of TNT used FFTs to calculate structure factors, but the much slower direct summation method to calculate the derivatives. Within a few years, Agarwal's method (Agarwal, 1978; Agarwal et al., 1981) was incorporated into TNT and from then on all crystallographic calculations were performed with FFTs.

The FFT programs of Ten Eyck (1973, 1977) made very efficient use of computer memory. Another means of saving memory was to recognize that the code for calculating stereochemical restraints did not need to be in the memory when the crystallographic calculations were being performed and vice versa. There were two ways to save memory using this information. One could create a series of `overlays' or one could break the calculation into a series of separate programs. The means for defining an overlay structure were never standardized and could not be ported from one type of computer to another and were, therefore, never attempted in TNT. For this reason, and a number of others mentioned here, TNT is not a single program but a collection of programs, each with a well defined and specialized purpose.

25.2.4.3.3. The source code should not require customization for each project

| top | pdf |

The need to state this goal seems remarkable in these modern times, but the truth is that most computer programs in the 1970s required specific customizations before they could be used. The simplest modifications were the definitions of the maximum number of atoms, residues, atom types etc. accepted by the program. These modifications are still required in Fortran77 programs because that language does not allow the dynamic allocation of memory. However, in most programs today the limits are set high enough that the standard configuration does not present a problem.

The most difficult modification required for programs like PROLSQ was to adapt the calculations to the space group in hand. Their authors usually included code for the space groups they were particularly interested in, leaving all others to be implemented by the user. Writing code for a new space group was often a daunting task for someone who was not an expert programmer and had no tools for testing the modifications.

It is too burdensome to require the user to understand sufficiently the internal workings of a complex calculation that they can code and debug central subroutines of a refinement program. In its initial implementation, TNT avoided this problem, to an extent, by performing the space-group-specific calculations in separate programs. At least the user did not need to modify an existing program. All that was required was the construction of a program that read the proper format file, performed the calculation and wrote its answer in the proper format. The user was required to supply both a program that could calculate structure factors from the model and another program that could calculate the derivative of the diffraction component of the residual function with respect to the atomic parameters of the model.

While a structure-factor program could usually be located, either by finding an existing program or by expanding the model to a lower-symmetry space group for which a program did exist, the requirement of creating a derivative program proved too great a burden. The derivation of the space-group-specific calculation, its implementation and debugging proved too difficult for almost everyone, and this design was quickly abandoned. Instead, an implementation of Agarwal's (1978) algorithm was created. In this method, the derivatives are calculated with a series of convolutions with an $[F_{o} - F_{c}]$ map. The calculation of the map is the only space-group-specific part of the calculation, and this was done with a separate program for calculating Fourier syntheses. Such programs were as easy to come by as structure-factor calculation programs and could be replaced by a lower-symmetry program if required.

While it is easier to find or write a program that only calculates a Fourier transform and much easier to debug one than to debug a modification to a larger and more complex program, it is still difficult. The lack of availability of programs for the space group of a crystal often prevented the use of TNT. Over time, programs for more space groups were written and distributed with TNT. Eventually, a method was developed by one of TNT's authors in which FFTs could be calculated using a single program as efficiently as the original space-group-specific programs. Once this program existed, there was no longer the need for isolated structure-factor and Fourier synthesis programs. These calculations have disappeared into the heart of TNT, and TNT consists of many fewer programs today than in the past.

References

Agarwal, R. C. (1978). A new least-squares technique based on the fast Fourier transform algorithm. Acta Cryst. A34, 791–809.Google Scholar

Agarwal, R. C., Lifchitz, A. & Dodson, E. (1981). Block diagonal least squares refinement using fast Fourier techniques. In Refinement of protein structures, edited by P. A. Machin, J. W. Campbell & M. Elder. Warrington: Daresbury Laboratory.Google Scholar

Ten Eyck, L. F. (1973). Crystallographic fast Fourier transforms. Acta Cryst. A29, 183–191.Google Scholar

Ten Eyck, L. F. (1977). Efficient structure-factor calculation for large molecules by the fast Fourier transform. Acta Cryst. A33, 486–492.Google Scholar

International Tables for Crystallography (2006). Vol. F. ch. 25.2, pp. 717-718