The structure-determination language of the Crystallography & NMR System

Brunger, A. T.; Adams, P. D.; DeLano, W. L.; Gros, P.; Grosse-Kunstleve, R. W.; Jiang, J.-S.; Pannu, N. S.; Read, R. J.; Rice, L. M.; Simonson, T.

doi:10.1107/97809553602060000724

International
Tables for
Crystallography
Volume F
Crystallography of biological macromolecules
Edited by M. G. Rossmann and E. Arnold

pdf | chapter contents | chapter index | related articles

International Tables for Crystallography (2006). Vol. F. ch. 25.2, pp. 710-716 | 1 | 2 |

Section 25.2.3. The structure-determination language of the Crystallography & NMR System

A. T. Brunger,^v ^* P. D. Adams,^e W. L. DeLano,^f P. Gros,^g R. W. Grosse-Kunstleve,^e J.-S. Jiang,^h N. S. Pannu,ⁱ R. J. Read,^j L. M. Rice^k and T. Simonson^l

25.2.3. The structure-determination language of the Crystallography & NMR System

| top | pdf |

25.2.3.1. Introduction

| top | pdf |

We have developed a new and advanced software system, the Crystallography & NMR System (CNS), for crystallographic and NMR structure determination (Brünger et al., 1998). The goals of CNS are: (1) to create a flexible computational framework for exploration of new approaches to structure determination; (2) to provide tools for structure solution of difficult or large structures; (3) to develop models for analysing structural and dynamical properties of macromolecules; and (4) to integrate all sources of information into all stages of the structure-determination process.

To meet these goals, algorithms were moved from the source code into a symbolic structure-determination language which represents a new concept in computational crystallography. The high-level CNS computing language allows definition of symbolic target functions , data structures, procedures and modules. The CNS program acts as an interpreter for the high-level CNS language and includes hard-wired functions for efficient processing of computing-intensive tasks. Methods and algorithms are therefore more clearly defined and easier to adapt to new and challenging problems. The result is a multi-level system which provides maximum flexibility to the user (Fig. 25.2.3.1). The CNS language provides a common framework for nearly all computational procedures of structure determination. A comprehensive set of crystallographic procedures for phasing, density modification and refinement has been implemented in this language. Task-oriented input files written in the CNS language, which can also be accessed through an HTML graphical interface (Graham, 1995), are available to carry out these procedures.

Figure 25.2.3.1| top | pdf |

CNS consists of five layers which are under user control. The high-level HTML graphical interface interacts with the task-oriented input files. The task files use the CNS language and the modules. The modules contain CNS language statements. The CNS language is interpreted by the CNS Fortran77 program. The program performs the data manipulations, data operations and `hard-wired' algorithms.

25.2.3.2. The CNS language

| top | pdf |

One of the key features of the CNS language is symbolic data structure manipulation, for example, $[\tt\displaylines{\hbox{xray}\hfill\cr\quad\hbox{do } (\hbox{pa}=-2* (\hbox{amplitude(fp)}\;\hat{}\;2 + \hbox{amplitude(fh)}\;\hat{}\;2\hfill\cr{\hbox to 3.8pc{}} -\hbox{amplitude(fph)}\ \hat{}\ 2)* \hbox{amplitude(fp)}\hfill\cr{\hbox to 3.8pc{}} *\;\hbox{real(fh)}/(3* \hbox{v}\; \hat{}\; 2 + 4* (\hbox{amplitude(fph)}\;\hat{}\;2\hfill\cr{\hbox to 3.8pc{}}+\hbox{sph}\;\hat{}\; 2)* \hbox{v})) \hbox{ (acentric)} \hfill\cr \hbox{end}\hfill {\rm(25.2.3.1)}}]$ which is equivalent to the following mathematical expression for all acentric indices h, $[p_{a}({\bf h}) = 2 {- [|{\bf f}_{p}({\bf h})|^{2} + |{\bf f}_{h}({\bf h})|^{2} - |{\bf f}_{ph}({\bf h})|^{2}] |{\bf f}_{p}({\bf h})| \{[{\bf f}_{h}({\bf h}) + {\bf f}_{h}({\bf h})^{*}]/2\} \over 3 v({\bf h})^{2} + 4 [|{\bf f}_{ph}({\bf h})|^{2} + s_{ph}({\bf h})^{2}] v({\bf h})},\eqno(25.2.3.2)]$ where $[{\bf f}_{p}]$ [`fp' in equation (25.2.3.1)] is the `native' structure-factor array, $[{\bf f}_{ph}]$ [`fph' in equation (25.2.3.1)] is the derivative structure-factor array, $[s_{ph}]$ [`sph' in equation (25.2.3.1)] is the corresponding experimental σ, v is the expectation value for the lack of closure (including lack of isomorphism and errors in the heavy-atom model), and $[{\bf f}_{h}]$ [`fh' in equation (25.2.3.1)] is the calculated heavy-atom structure-factor array. This expression computes the $[A_{\rm iso}]$ coefficient of the phase probability distribution for single isomorphous replacement described by Hendrickson & Lattman (1970) and Blundell & Johnson (1976).

The expression in equation (25.2.3.1) is computed for the specified subset of reflections `(acentric)'. This expression means that only the selected (in this case all acentric) reflections are used. More sophisticated selections are possible, e.g. $[\tt\eqalignno{&(\hbox{amplitude(fp)}\gt 2* \hbox{sh and amplitude(fph)}\gt 2* \hbox{sph}&\cr &\quad\hbox{and d}\gt = 3) &{\rm(25.2.3.3)}\cr}]$ selects all reflections with Bragg spacing, d, greater than 3 Å for which both native (fp) and derivative (fph) amplitudes are greater than two times their corresponding σ values (`sh' and `sph', respectively). Extensive use of this structure-factor selection facility is made for cross-validating statistical properties, such as R values (Brünger, 1992b), $[\sigma_{A}]$ values (Kleywegt & Brünger, 1996; Read, 1997) and maximum-likelihood functions (Pannu & Read, 1996a; Adams et al., 1997).

Similar operations exist for electron-density maps, e.g. $[\tt\eqalignno{ &\hbox{xray} &\cr &\quad \hbox{do}\ (\hbox{map}=0)\ (\hbox{map}\lt 0.1) &\cr &\hbox{end} &{\rm (25.2.3.4)}\cr}]$ is an example of a truncation operation: all map values less than 0.1 are set to 0. Atoms can be selected based on a number of atomic properties and descriptors, e.g. $[\tt\eqalignno{ \hbox{do}\ (\hbox{b}=10) &(\hbox{residue 1:40 and} &\cr &\quad (\hbox{name ca or name n or name c or name o})) &\cr&&{\rm(25.2.3.5)}\cr}]$ sets the B factors of all polypeptide backbone atoms of residues 1 through 40 to 10 Å².

Operations exist between data structures, e.g. real- and reciprocal-space arrays, and atom properties. For example, Fourier transformations between real and reciprocal space can be accomplished by the following CNS commands: $[\tt\eqalignno{ &\hbox{xray} &\cr &\quad \hbox{mapresolution infinity 3}. &\cr &\quad \hbox{fft grid 0.3333 end} &\cr &\quad \hbox{do}\ (\hbox{map}=\hbox{ft}(\hbox{f\_cal}))\ (\hbox{acentric}) &\cr &\hbox{end} &{\rm(25.2.3.6)}\cr}]$ which computes a map on a 1 Å grid by Fourier transformation of the $[`\hbox{f}\_\hbox{cal'}]$ array for all acentric reflections.

Atoms can be associated with calculated structure factors, e.g. $[\tt\hbox{associate f\_cal (residue 1:50)} \eqno(25.2.3.7)]$ This statement will associate the reciprocal-space array `f_cal' with the atoms belonging to residues 1 through 50. These structure-factor associations are used in the symbolic target functions described below.

There are no predefined reciprocal- or real-space arrays in CNS. Dynamic memory allocation allows one to carry out operations on arbitrarily large data sets with many individual entries (e.g. derivative diffraction data) without the need to recompile the source code. The various reciprocal-space structure-factor arrays must therefore be declared and their type specified prior to invoking them. For example, a reciprocal-space array with real values, such as observed amplitudes, is declared by $[{\tt\hbox{declare name} = \hbox{fobs type} = \hbox{real domain} = \hbox{reciprocal end}} \eqno(25.2.3.8)]$ Reciprocal-space arrays can be grouped. For example, Hendrickson & Lattman (1970) coefficients are represented as a group of four reciprocal-space structure-factor arrays, $[\tt\eqalignno{&\hbox{group type} = \hbox{hl object} = \hbox{pa object} = \hbox{pb}&\cr&\quad\hbox{object} = \hbox{pc object} = \hbox{pd end} &{\rm(25.2.3.9)}}]$ where `pa', `pb', `pc' and `pd' refer to the individual arrays. This group statement indicates to CNS that the specified arrays need to be transformed together when reflection indices are changed, e.g. during expansion of the diffraction data to space group P1.

25.2.3.3. Symbols and parameters

| top | pdf |

The CNS language supports two types of data elements which may be used to store and retrieve information. Symbols are typed variables, such as numbers, character strings of restricted length and logicals. Parameters are untyped data elements of arbitrary length that may contain collections of CNS commands, numbers, strings, or symbols.

Symbols are denoted by a dollar sign ($), and parameters by an ampersand (&). Symbols and parameters may contain a single data element, or they may be a compound data structure of arbitrary complexity. The hierarchy of these data structures is denoted using a period (.). Figs. 25.2.3.2(a) and (b) demonstrate how crystal-lattice information can be stored in compound symbols and parameters, respectively. The information stored in symbols or parameters can be retrieved by simply referring to them within a CNS command: the symbol or parameter name is substituted by its content. Symbol substitution of portions of the compound names (e.g. $[\hbox{`\&crystal}\_\hbox{lattice}.\hbox{unit}\_\hbox{cell}.\hbox{\$} \hbox{para'}]$ ) allows one to carry out conditional and iterative operations on data structures, such as matrix multiplication.

Figure 25.2.3.2| top | pdf |

Examples of compound symbols and compound parameters. (a) The `evaluate' statement is used to define typed symbols (strings, numbers and logicals). Symbol names are in bold. (b) The `define' statement is used to define untyped parameters. Each parameter entry is terminated by a semicolon. The compound base name `crystal_lattice' has a number of sub-levels, such as `space_group' and the `unit_cell' parameters. `unit_cell' is itself base to a number of sub-levels, such as `a' and `alpha'. Parameter names are in bold.

25.2.3.4. Statistical functions

| top | pdf |

The CNS language contains a number of statistical operations, such as binwise averages and summations. The resolution bins are defined by a central facility in CNS.

Fig. 25.2.3.3 shows how $[\sigma_{A}]$ , $[\sigma_{\Delta}]$ and D (Read, 1986, 1990) are computed from the observed structure factors (`fobs') and the calculated model structure factors (`fcalc') using the CNS statistical operations. The first five operations are performed for the reflections in the test set, while the last three operations expand the results to all reflections. The `norm' function computes normalized structure-factor amplitudes for the specified arguments. The `sigacv' function evaluates $[\sigma_{A}]$ from the normalized structure factors. The `save' function computes the statistical average $[\hbox{save}(\;f) = {\textstyle\sum_{hkl}\displaystyle f_{hkl} (w/\varepsilon) \over \textstyle\sum_{hkl}\displaystyle w}, \eqno(25.2.3.10)]$ where w is 1 and 2 for centric and acentric reflections, respectively, and ɛ is the statistical weight. The averages are computed binwise, and the result for a particular bin is stored in all selected reflections belonging to the bin.

Figure 25.2.3.3| top | pdf |

Example for statistical operations provided by the CNS language. `norm', `sigacv', `save' and `sum' are functions that are computed internally by the CNS program. Binwise operations are in italics (`sigacv', `save' and `sum'). The result for a particular bin is stored in all elements belonging to the bin. The $[\sigma_{A}]$ (`sigmaA') parameters are computed in binwise resolution shells. The $[\sigma_{\Delta}]$ (`sigmaD') and D parameters are then computed from $[\sigma_{A}]$ and binwise averages involving $[|{\bf F}_{o}|^{2}]$ and $[|{\bf F}_{c}|^{2}]$ . The binwise results are expanded to all reflections by the last three statements. `test' is an array that is 1 for all reflections in the test set and 0 otherwise. `sum' is a binwise operation on all reflections with the same partitioning used for the test set.

25.2.3.5. Symbolic target function

| top | pdf |

One of the key innovative features of CNS is the ability to symbolically define target functions and their first derivatives for crystallographic searches and refinement. This allows one conveniently to implement new crystallographic methodologies as they are being developed.

The power of symbolic target functions is illustrated by two examples. In the first example, a target function is defined for simultaneous heavy-atom parameter refinement of three derivatives. The sites for each of the three derivatives can be disjoint or identical, depending on the particular situation. For simplicity, the Blow & Crick (1959) approach is used, although maximum-likelihood targets are also possible (see below). The heavy-atom sites are refined against the target $[\eqalignno{ &\displaystyle\sum\limits_{hkl} {(|{\bf F}_{h_{1}} + {\bf F}_{p}| - |{\bf F}_{ph_{1}}|)^{2} \over 2 v_{1}} + {(|{\bf F}_{h_{2}} + {\bf F}_{p}| - |{\bf F}_{ph_{2}}|)^{2} \over 2 v_{2}} &\cr &\quad + {(|{\bf F}_{h_{3}} + {\bf F}_{p}| - |{\bf F}_{ph_{3}}|)^{2} \over 2v_{3}}. &(25.2.3.11)\cr}]$

$[{\bf F}_{h_{1}}]$ , $[{\bf F}_{h_{2}}]$ and $[{\bf F}_{h_{3}}]$ are complex structure factors corresponding to the three sets of heavy-atom sites, $[{\bf F}_{p}]$ represents the structure factors of the native crystal, $[|{\bf F}_{ph_{1}}|]$ , $[|{\bf F}_{ph_{2}}|]$ and $[|{\bf F}_{ph_{3}}|]$ are the structure-factor amplitudes of the derivatives, and $[v_{1}]$ , $[v_{2}]$ and $[v_{3}]$ are the variances of the three lack-of-closure expressions. The corresponding target expression and its first derivatives with respect to the calculated structure factors are shown in Fig. 25.2.3.4(a). The derivatives of the target function with respect to each of the three associated structure-factor arrays are specified with the `dtarget' expressions. The `tselection' statement specifies the selected subset of reflections to be used in the target function (e.g. excluding outliers), and the `cvselection' statement specifies a subset of reflections to be used for cross-validation (Brünger, 1992b) (i.e. the subset is not used during refinement but only as a monitor for the progress of refinement).

Figure 25.2.3.4| top | pdf |

Examples of symbolic definition of a refinement target function and its derivatives with respect to the calculated structure-factor arrays. (a) Simultaneous refinement of heavy-atom sites of three derivatives. The target function is defined by the `target' expression. ` $[\hbox{f}\_\hbox{h}\_1]$ ', ` $[\hbox{f}\_\hbox{h}\_2]$ ' and ` $[\hbox{f}\_\hbox{h}\_3]$ ' (in bold) are complex structure factors corresponding to three sets of heavy atoms that are specified using atom selections [equation (25.2.3.7)]. The target function and its derivatives with respect to the three structure-factor arrays are defined symbolically using the structure-factor amplitudes of the native crystal, ` $[\hbox{f}\_\hbox{p}]$ ', those of the derivatives, ` $[\hbox{f}\_\hbox{ph}\_1]$ ', ` $[\hbox{f}\_\hbox{ph}\_2]$ ', ` $[\hbox{f}\_\hbox{ph}\_3]$ ', the complex structure factors of the heavy-atom models, ` $[\hbox{f}\_\hbox{h}\_1]$ ', ` $[\hbox{f}\_\hbox{h}\_2]$ ', ` $[\hbox{f}\_\hbox{h}\_3]$ ', and the corresponding lack-of-closure variances, ` $[\hbox{v}\_1]$ ', ` $[\hbox{v}\_2]$ ' and ` $[\hbox{v}\_3]$ '. The summation over the selected stucture factors (`tselection') is performed implicitly. (b) Refinement of two independent models against perfectly twinned data. `fcalc1' and `fcalc2' are complex structure factors for the models that are related by a twinning operation (in bold). The target function and its derivatives with respect to the two structure-factor arrays are explicitly defined.

The second example is the refinement of a perfectly twinned crystal with overlapping reflections from two independent crystal lattices. Refinement of the model is carried out against the residual $[\textstyle\sum\limits_{hkl}\displaystyle |{\bf F}_{\rm obs} |- (|{\bf F}_{\rm calc1}|^{2} + |{\bf F}_{\rm calc2}|^{2})^{1/2}. \eqno(25.2.3.12)]$ The symbolic definition of this target is shown in Fig. 25.2.3.4(b). The twinning operation itself is imposed as a relationship between the two sets of selected atoms (not shown). This example assumes that the two calculated structure-factor arrays (`fcalc1' and `fcalc2') that correspond to the two lattices have been appropriately scaled with respect to the observed structure factors, and the twinning fractions have been incorporated into the scale factors. However, a more sophisticated target function could be defined which incorporates scaling.

A major advantage of the symbolic definition of the target function and its derivatives is that any arbitrary function of structure-factor arrays can be used. This means that the scope of possible targets is not limited to least-squares targets. Symbolic definition of numerical integration over unknown variables (such as phase angles) is also possible. Thus, even complicated maximum-likelihood target functions (Bricogne, 1984; Otwinowski, 1991; Pannu & Read, 1996a; Pannu et al., 1998) can be defined using the CNS language. This is particularly valuable at the prototype stage. For greater efficiency, the standard maximum-likelihood targets are provided through CNS source code which can be accessed as functions in the CNS language. For example, the maximum-likelihood target function MLF (Pannu & Read, 1996a) and its derivative with respect to the calculated structure factors are defined as $[\tt\eqalignno{\hbox{target} &= \hbox{(mlf (fobs,sigma,(fcalc + fbulk),} &\cr&\quad\hbox{d,sigma\_delta))} &\cr \hbox{dtarget} &= \hbox{(dmlf (fobs,sigma,(fcalc + fbulk),}\cr&\quad\hbox{d,sigma\_delta))} &{\rm(25.2.3.13)}\cr}]$ where `mlf( )' and `dmlf( )' refer to internal maximum-likelihood functions, `fobs' and `sigma' are the observed structure-factor amplitudes and corresponding σ values, `fcalc' is the (complex) calculated structure-factor array, `fbulk' is the structure-factor array for a bulk solvent model, and `d' and $[`\hbox{sigma}\_\hbox{delta'}]$ are the cross-validated D and $[\sigma_{\Delta}]$ functions (Read, 1990; Kleywegt & Brünger, 1996; Read, 1997) which are precomputed prior to invoking the MLF target function using the test set of reflections. The availability of internal Fortran subroutines for the most computing-intensive target functions and the symbolic definitions involving structure-factor arrays allow for maximal flexibility and efficiency. Other examples of available maximum-likelihood target functions include MLI (intensity-based maximum-likelihood refinement), MLHL [crystallographic model refinement with prior phase information (Pannu et al., 1998)], and maximum-likelihood heavy-atom parameter refinement for multiple isomorphous replacement (Otwinowski, 1991) and MAD phasing (Hendrickson, 1991; Burling et al., 1996). Work is in progress to define target functions that include correlations between different heavy-atom derivatives (Read, 1994).

25.2.3.6. Modules and procedures

| top | pdf |

Modules exist as separate files and contain collections of CNS commands related to a particular task. In contrast, procedures can be defined and invoked from within any file. Modules and procedures share a similar parameter-passing mechanism for both input and output. Modules and procedures make it possible to write programs in the CNS language in a manner similar to that of a computing language, such as Fortran or C. CNS modules and procedures have defined sets of input (and output) parameters that are passed into them (or returned) when they are invoked. This enables long collections of CNS language statements to be broken down into modules for greater clarity of the underlying algorithm.

Parameters passed into a module or procedure inherit the scope of the calling task file or module, and thus they exhibit a behaviour analogous to most computing languages. Symbols defined within a module or procedure are purely local variables.

The following example shows how the unit-cell parameters defined above (Fig. 25.2.3.2b) are passed into a module named `compute_unit_cell_volume' (Fig. 25.2.3.5), which computes the volume of the unit cell from the crystal lattice parameters using well established formulae (Stout & Jensen, 1989): $[\tt\eqalignno{ \hbox{@compute\_unit\_cell\_volume } &\hbox{(cell} = \&\hbox{crystal\_lattice.unit\_cell\semi} &\cr &\quad \hbox{volume} =\$\hbox{cell\_volume;)} &\cr&&\rm{(25.2.3.14)}}]$ The parameter `volume' is equated to the symbol $[`\$\hbox{cell}\_\hbox{volume'}]$ upon invocation in order to return the result (the unit-cell volume) from this module. Note that the use of compound parameters to define the crystal lattice parameters (Fig. 25.2.3.2b) provides a convenient way to pass all required information into the module by referring to the base name of the compound parameter ( $[`\&\hbox{crystal}\_\hbox{lattice}.\hbox{unit}\_\hbox{cell'}]$ ) instead of having to specify each individual data element.

Figure 25.2.3.5| top | pdf |

Use of compound parameters within a module. This module computes the unit-cell volume (Stout & Jensen, 1989) from the unit-cell geometry. Input and output parameter base names are in bold. Local symbols, such as cabg.1, are defined through `evaluate' statements. The result is stored in the parameter `&volume' which is passed to the invoking task file or module.

Fig. 25.2.3.6(a) shows another example of a CNS module: the module named $[`\hbox{phase}\_\hbox{distribution'}]$ computes phase probability distributions using the Hendrickson & Lattman formalism (Hendrickson & Lattman, 1970; Hendrickson, 1979; Blundell & Johnson, 1976). An example for invoking the module is shown in Fig. 25.2.3.6(b). This module could be called from task files that need access to isomorphous phase probability distributions. It would be straightforward to change the module in order to compute different expressions for the phase probability distributions.

Figure 25.2.3.6| top | pdf |

Example of (a) a CNS module and (b) the corresponding module invocation. Input and output parameters are in bold. The module invocation is performed by specifying the `@' character, followed by the name of the module file and the module parameter substitutions. The ampersand (&) indicates that the particular symbol (e.g. `&fp') is substituted with the specified value in the invocation statement [e.g. `fobs' in the case of `&fp' in (b)]. The module parameter substitution is performed literally, and any string of characters between the equal sign and the semicolon will be substituted.

A large number of additional modules are available for crystallographic phasing and refinement. CNS library modules include space-group information, Gaussian atomic form factors, anomalous-scattering components, and molecular parameter and topology databases.

25.2.3.7. Task files

| top | pdf |

Task files consist of CNS language statements and module invocations. The CNS language permits the design and execution of nearly any numerical task in X-ray crystallographic structure determination using a minimal set of `hard-wired' functions and routines. A list of the currently available crystallographic procedures and features is shown in Fig. 25.2.3.7.

Figure 25.2.3.7| top | pdf |

Procedures and features available in CNS for structure determination by X-ray crystallography.

Each task file is divided into two main sections: the initial parameter definition and the main body of the task file. The definition section contains definitions of all CNS parameters that are used in the main body of the task file. Modification of the main body of the file is not required, but may be done by experienced users in order to experiment with new algorithms. The definition section also contains the directives that specify specific HTML features, e.g. text comments (indicated by $[\{^{*} \ldots ^{*}\}]$ ), user-modifiable fields (indicated by $[\{===\gt \}]$ ), and choice boxes (indicated by $[\{+ \hbox{ choice: } \ldots + \}]$ ). Fig. 25.2.3.8 shows a portion of the `define' section of a typical CNS refinement task file.

Figure 25.2.3.8| top | pdf |

Example of a typical CNS task file: a section of the top portion of the simulated-annealing refinement protocol which contains the definition of various parameters that are needed in the main body of the task file. Each parameter is indicated by a name, an equal sign and an arbitrary sequence of characters terminated by a semicolon (e.g. `a=61.76;'). The top portion of each task file also contains commands for the HTML interface embedded in comment fields (indicated by braces, $[\{ \ldots \}]$ ). The commands that can be modified by the user in the HTML form are in bold.

The task files produce a number of output files (e.g. coordinate, reflection, graphing and analysis files). Comprehensive information about input parameters and results of the task are provided in these output files. In this way, the majority of the information required to reproduce the structure determination is kept with the results. Analysis data are often given in simple columns and rows of numbers. These data files can be used for graphing, for example, by using commonly available spreadsheet programs. An HTML graphical output feature for CNS which makes use of these analysis files is planned. In addition, list files are often produced that contain a synopsis of the calculation.

25.2.3.8. HTML interface

| top | pdf |

The HTML graphical interface uses HTML to create a high-level menu-driven environment for CNS (Fig. 25.2.3.9). Compact and relatively simple Common Gateway Interface (CGI) conversion scripts are available that transform a task file into a form page and the edited form page back into a task file (Fig. 25.2.3.10). These conversion scripts are written in PERL.

Figure 25.2.3.9| top | pdf |

Example of a CNS HTML form page. This particular example corresponds to the task file in Fig. 25.2.3.8.

Figure 25.2.3.10| top | pdf |

Use of the CNS HTML form page interface, emphasizing the correspondence between input fields in the form page and parameters in the task file.

A comprehensive collection of task files are available for crystallographic phasing and refinement (Fig. 25.2.3.7). New task files can be created or existing ones modified in order to address problems that are not currently met by the distributed collection of task files. The HTML graphical interface thus provides a common interface for distributed and `personal' CNS task files (Fig. 25.2.3.10).

25.2.3.9. Example: combined maximum-likelihood and simulated-annealing refinement

| top | pdf |

CNS has a comprehensive task file for simulated-annealing refinement of crystal structures using Cartesian (Brünger et al., 1987; Brünger, 1988) or torsion-angle molecular dynamics (Rice & Brünger, 1994). This task file automatically computes cross-validated $[\sigma_{A}]$ estimates, determines the weighting scheme between the X-ray refinement target function and the geometric energy function (Brünger et al., 1989), refines a flat bulk solvent model (Jiang & Brünger, 1994) and an overall anisotropic B value for the model by least-squares minimization, and subsequently refines the atomic positions by simulated annealing. Options are available for specification of alternate conformations, multiple conformers (Burling & Brünger, 1994), noncrystallographic symmetry constraints and restraints (Weis et al., 1990), and `flat' solvent models (Jiang & Brünger, 1994). Available target functions include the maximum-likelihood functions MLF, MLI and MLHL (Pannu & Read, 1996a; Adams et al., 1997; Pannu et al., 1998). The user can choose between slow cooling (Brünger et al., 1990) and constant-temperature simulated annealing, and the respective rate of cooling and length of the annealing scheme. For a review of simulated annealing in X-ray crystallography, see Brünger et al. (1997).

During simulated-annealing refinement, the model can be significantly improved. Therefore, it becomes important to recalculate the cross-validated $[\sigma_{A}]$ error estimates (Kleywegt & Brunger, 1996; Read, 1997) and the weight between the X-ray diffraction target function and the geometric energy function in the course of the refinement (Adams et al., 1997). This is important for the maximum-likelihood target functions that depend on the cross-validated $[\sigma_{A}]$ error estimates. In the simulated-annealing task file, the recalculation of $[\sigma_{A}]$ values and subsequently the weight for the crystallographic energy term are carried out after initial energy minimization, and also after molecular-dynamics simulated annealing.

25.2.3.10. Conclusions

| top | pdf |

CNS is a general system for structure determination by X-ray crystallography and solution NMR. It covers the whole spectrum of methods used to solve X-ray or solution NMR structures. The multi-layer architecture allows use of the system with different levels of expertise. The HTML interface allows the novice to perform standard tasks. The interface provides a convenient means of editing complicated task files, even for the expert (Fig. 25.2.3.10). This graphical interface makes it less likely that an important parameter will be overlooked when editing the file. In addition, the graphical interface can be used with any task file, not just the standard distributed ones. HTML-based documentation and graphical output is planned in the future.

Most operations within a crystallographic algorithm are defined through modules and task files. This allows for the development of new algorithms and for existing algorithms to be precisely defined and easily modified without the need for source-code modifications.

The hierarchical structure of CNS allows extensive testing at each level. For example, once the source code and CNS basic commands have been tested, testing of the modules and task files is performed. A test suite consisting of more than a hundred test cases is frequently evaluated during CNS development in order to detect and correct programming errors. Furthermore, this suite is run on several hardware platforms in order to detect any machine-specific errors. This testing scheme makes CNS highly reliable.

Algorithms can be readily understood by inspecting the modules or task files. This self-documenting feature of the modules provides a powerful teaching tool. Users can easily interpret an algorithm and compare it with published methods in the literature. To our knowledge, CNS is the only system that enables one to define symbolically any target function for a broad range of applications, from heavy-atom phasing or molecular-replacement searches to atomic resolution refinement.

References

Adams, P. D., Pannu, N. S., Read, R. J. & Brünger, A. T. (1997). Cross-validated maximum likelihood enhances crystallographic simulated annealing refinement. Proc. Natl Acad. Sci. USA, 94, 5018–5023.Google Scholar

Blow, D. M. & Crick, F. H. C. (1959). The treatment of errors in the isomorphous replacement method. Acta Cryst. 12, 794–802.Google Scholar

Blundell, T. L. & Johnson, L. N. (1976). Protein crystallography, pp. 375–377. London: Academic Press.Google Scholar

Bricogne, G. (1984). Maximum entropy and the foundations of direct methods. Acta Cryst. A40, 410–445.Google Scholar

Brünger, A. T. (1988). Crystallographic refinement by simulated annealing: application to a 2.8 Å resolution structure of aspartate aminotransferase. J. Mol. Biol. 203, 803–816.Google Scholar

Brünger, A. T. (1992b). Free R value: a novel statistical quantity for assessing the accuracy of crystal structures. Nature (London), 355, 472–475.Google Scholar

Brünger, A. T., Adams, P. D., Clore, G. M., DeLano, W. L., Gros, P., Grosse-Kunstleve, R. W., Jiang, J.-S., Kuszewski, J., Nilges, M., Pannu, N. S., Read, R. J., Rice, L. M., Simonson, T. & Warren, G. L. (1998). Crystallography & NMR System (CNS): a new software suite for macromolecular structure determination. Acta Cryst. D54, 905–921.Google Scholar

Brünger, A. T., Adams, P. D. & Rice, L. M. (1997). New applications of simulated annealing in X-ray crystallography and solution NMR. Structure, 5, 325–336. Google Scholar

Brünger, A. T., Karplus, M. & Petsko, G. A. (1989). Crystallographic refinement by simulated annealing: application to crambin. Acta Cryst. A45, 50–61.Google Scholar

Brünger, A. T., Krukowski, A. & Erickson, J. W. (1990). Slow-cooling protocols for crystallographic refinement by simulated annealing. Acta Cryst. A46, 585–593.Google Scholar

Brünger, A. T., Kuriyan, J. & Karplus, M. (1987). Crystallographic R factor refinement by molecular dynamics. Science, 235, 458–460.Google Scholar

Burling, F. T. & Brünger, A. T. (1994). Thermal motion and conformational disorder in protein crystal structures: comparison of multi-conformer and time-averaging models. Isr. J. Chem. 34, 165–175.Google Scholar

Burling, F. T., Weis, W. I., Flaherty, K. M. & Brünger, A. T. (1996). Direct observation of protein solvation and discrete disorder with experimental crystallographic phases. Science, 271, 72–77.Google Scholar

Graham, I. S. (1995). The HTML sourcebook. John Wiley and Sons.Google Scholar

Hendrickson, W. A. (1979). Phase information from anomalous-scattering measurements. Acta Cryst. A35, 245–247.Google Scholar

Hendrickson, W. A. (1991). Determination of macromolecular structures from anomalous diffraction of synchrotron radiation. Science, 254, 51–58.Google Scholar

Hendrickson, W. A. & Lattman, E. E. (1970). Representation of phase probability distributions for simplified combination of independent phase information. Acta Cryst. B26, 136–143.Google Scholar

Jiang, J.-S. & Brünger, A. T. (1994). Protein hydration observed by X-ray diffraction: solvation properties of penicillopepsin and neuraminidase crystal structures. J. Mol. Biol. 243, 100–115.Google Scholar

Kleywegt, G. J. & Brünger, A. T. (1996). Checking your imagination: applications of the free R value. Structure, 4, 897–904.Google Scholar

Otwinowski, Z. (1991). In Proceedings of the CCP4 study weekend. Isomorphous replacement and anomalous scattering, edited by W. Wolf, P. R. Evans & A. G. W. Leslie, pp. 80–86. Warrington: Daresbury Laboratory.Google Scholar

Pannu, N. S., Murshudov, G. N., Dodson, E. J. & Read, R. J. (1998). Incorporation of prior phase information strengthens maximum-likelihood structure refinement. Acta Cryst. D54, 1285–1294.Google Scholar

Pannu, N. S. & Read, R. J. (1996a). Improved structure refinement through maximum likelihood. Acta Cryst. A52, 659–668.Google Scholar

Read, R. J. (1986). Improved Fourier coefficients for maps using phases from partial structures with errors. Acta Cryst. A42, 140–149.Google Scholar

Read, R. J. (1990). Structure-factor probabilities for related structures. Acta Cryst. A46, 900–912.Google Scholar

Read, R. J. (1994). Maximum likelihood refinement of heavy atoms. Lecture notes for a workshop on isomorphous replacement methods in macromolecular crystallography. American Crystallographic Association Annual Meeting, 1994, Atlanta, GA, USA.Google Scholar

Read, R. J. (1997). Model phases: probabilities and bias. Methods Enzymol. 277, 110–128.Google Scholar

Rice, L. M. & Brünger, A. T. (1994). Torsion angle dynamics: reduced variable conformational sampling enhances crystallographic structure refinement. Proteins Struct. Funct. Genet. 19, 277–290.Google Scholar

Stout, G. H. & Jensen, L. H. (1989). X-ray structure determination, p. 33. New York: Wiley Interscience.Google Scholar

Weis, W. I., Brünger, A. T., Skehel, J. J. & Wiley, D. C. (1990). Refinement of the influenza virus haemagglutinin by simulated annealing. J. Mol. Biol. 212, 737–761.Google Scholar

International Tables for Crystallography (2006). Vol. F. ch. 25.2, pp. 710-716