The Nucleic Acid Database (NDB)

Berman, H. M.; Feng, Z.; Schneider, B.; Westbrook, J.; Zardecki, C.

doi:10.1107/97809553602060000719

International
Tables for
Crystallography
Volume F
Crystallography of biological macromolecules
Edited by M. G. Rossmann and E. Arnold

pdf | chapter contents | chapter index | related articles

International Tables for Crystallography (2006). Vol. F. ch. 24.2, pp. 657-662 | 1 | 2 |
https://doi.org/10.1107/97809553602060000719

Chapter 24.2. The Nucleic Acid Database (NDB)

H. M. Berman,^a ^* Z. Feng,^a B. Schneider,^a J. Westbrook^a and C. Zardecki^a

^aThe Nucleic Acid Database Project, Department of Chemistry, Rutgers University, 610 Taylor Road, Piscataway, NJ 08854-8077, USA
Correspondence e-mail: berman@rcsb.rutgers.edu

The Nucleic Acid Database is a repository of three-dimensional structural information about nucleic acids and serves as a resource for research and education. The information content and database features of the NDB are described.

Keywords: data processing; databases; Nucleic Acid Database.

24.2.1. Introduction

| top | pdf |

The Nucleic Acid Database (NDB) (Berman et al., 1992 ) was established in 1991 as a resource for specialists in the field of nucleic acid structure. Its purpose was to gather all of the structural information about nucleic acids that had been obtained from X-ray crystallographic experiments and to organize them in such a way that it would be easy to retrieve the coordinates, the information about the experimental conditions used to derive these coordinates, and the structural information that could be derived from these coordinates. Since many NDB users are not crystallographers, the information provided by the database has been presented in such a way as to maximize its utility for various types of modelling and structure prediction.

Since the NDB was founded, many new technologies have presented new challenges and opportunities. The emergence of the World Wide Web has allowed for the creative and powerful dissemination and collection of data and information. The development of a standard interchange format for handling crystallographic data, the macromolecular Crystallographic Information File (mmCIF; Bourne et al., 1997 ), has made it possible to ensure the integrity and consistency of the data in the archive. The NDB has used these resources to provide both a relational database and an archive of information to a global community.

24.2.2. Information content of the NDB

| top | pdf |

Structures available in the NDB include RNA and DNA oligonucleotides with two or more bases either alone or complexed with ligands, natural nucleic acids such as tRNA, and protein–nucleic acid complexes. The archive stores both primary and derived information about the structures. The primary data include the crystallographic coordinate data, structure factors and information about the experiments used to determine the structures, such as crystallization information, data collection and refinement statistics. Derived information, such as valence geometry, torsion angles, base-morphology parameters and intermolecular contacts, is calculated and stored in the database. Database entries are further annotated to include information about the overall structural features, including conformational classes, special structural features, biological functions and crystal-packing classifications. Table 24.2.2.1 summarizes the information content of the NDB.

Table 24.2.2.1| top | pdf |
The information content of the NDB

(a) Primary experimental information stored in the NDB.

Structure summary – descriptor; NDB, PDB and CSD names; coordinate availability; modifications, mismatches and drugs (yes/no)

Structural description – sequence; structure type; descriptions about modifications, mismatches and drugs; description of asymmetric and biological units

Citation – authors, title, journal, volume, pages, year

Crystal data – cell dimensions; space group

Data-collection description – radiation source and wavelength; data-collection device; temperature; resolution range; total and unique number of reflections

Crystallization description – method; temperature; pH value; solution composition

Refinement information – method; program; number of reflections used for refinement; data cutoff; resolution range; R factor; refinement of temperature factors and occupancies

Coordinate information – atomic coordinates, occupancies and temperature factors for asymmetric unit; coordinates for symmetry-related strands; coordinates for unit cell; symmetry-related coordinates; orthogonal or fractional coordinates

(b) Derivative information stored in the NDB.

Distances – chemical bond lengths; virtual bonds (involving phosphorus atoms)

Torsions – backbone and side-chain torsion angles; pseudorotational parameters

Angles – valence bond angles, virtual angles (involving phosphorus atoms)

Base morphology – parameters calculated by different algorithms

Nonbonded contacts

Valence geometry r.m.s. deviations from small-molecule standards

Sequence pattern statistics

24.2.3. Data processing

| top | pdf |

Data processing includes data collection, integrity checking and validation of the entries. Once processing is completed, the data are entered into the database. This is accomplished using the integrated system that is illustrated in Fig. 24.2.3.1 .

Figure 24.2.3.1| top | pdf |

Flow chart showing the organization of the Nucleic Acid Database Project. The core of this integrated system is the database.

Structures are entered electronically into the NDB after they have been deposited directly by the experimentalist or by the NDB annotators, who scan the literature and the Protein Data Bank (PDB; Bernstein et al., 1977 ; Berman et al., 2000 ). The coordinate data may be deposited in any PDB format or in mmCIF format. The entries are transformed into mmCIF format and then annotated using a web-based tool (Westbrook, 1998 ). This tool operates on top of the mmCIF dictionary (Bourne et al., 1997 ) and is used to incorporate experimental information to create a fully populated mmCIF format file. In the next stage of data processing, a program called MAXIT (Macromolecular Exchange and Input Tool; Feng, Hsieh et al., 1998 ) checks and corrects atom numbering and ordering as well as the correspondence between the PDB SEQRES record and the residue names in the coordinate files. Once these integrity checks are completed, the structures are validated using a variety of programs.

NUCheck (Feng, Westbrook & Berman, 1998 ) verifies valence geometry, torsion angles, intermolecular contacts and the chiral centres of the sugars and phosphates. The dictionaries used for checking the structures were developed by the NDB project from analyses (Clowney et al., 1996 , Gelbin et al., 1996 ) of high-resolution small-molecule structures from the Cambridge Structural Database (CSD; Allen et al., 1979 ). The torsion-angle ranges were derived from an analysis of high-resolution nucleic acid structures (Schneider et al., 1997 ). One important outgrowth of these validation projects was the creation of the force constants and restraints that are now in common use for crystallographic refinement of nucleic acid structures (Parkinson et al., 1996 ). The program SFCHECK (Vaguine et al., 1999 ) is used to validate the model against the structure-factor data. The R factor and resolution are verified and the residue-based features are examined with this program. Once an entry has been processed satisfactorily, it is entered into the database.

24.2.4. The database

| top | pdf |

The core of the NDB project is a relational database in which all of the primary and derived data items are organized into tables. At present, there are over 90 tables in the NDB, with each table containing five to 20 data items. These tables contain both experimental and derived information. Example tables include: the citation table, which contains all the items that are present in literature references; the cell_dimension table, which contains all items related to crystal data; and the refine_parameters table, which contains the items that describe the refinement statistics.

Interaction with the database is a two-step process (Fig. 24.2.4.1). In the first step, the user defines the selection criteria by combining different database items. As an example, the user could select all B-DNA structures whose resolution is better than 2.0 Å, whose R factor is better than 0.17, and which were determined by the authors Dickerson, Kennard, or Rich. Once the structures that meet the constraint criteria have been selected, reports may be written using a combination of table items. For any set of chosen structures, a large variety of reports may be created. For the example set of structures given above, a crystal-data report or a backbone torsion-angle report can be easily generated, or the user could write a report that lists the twist values for all CG steps together with statistics, including mean, median and range of values. The constraints used for the reports do not have to be the same as those used to select the structures. Some examples of reports from the NDB are given in Fig. 24.2.4.2 .

Figure 24.2.4.1| top | pdf |

Flow chart demonstrating the two steps involved in searching the NDB: structure selection and report generation.

Figure 24.2.4.2| top | pdf |

Examples of reports generated from the NDB about torsion angles. (a) A scattergram showing the relationship of ɛ (C4′—C3′—O3′—P) versus ζ (C3′—O3′—P—O5′). The two clusters, BI and BII, are labelled. (b) A histogram for α (O3′—P—O5′—C5′) for all B-DNA. (c) A conformation wheel showing the torsion angles for structure BDJ025 (Grzeskowiak et al., 1991 ) over the average values for all B-DNA. (d) A torsion-angle report for BDJ025.

24.2.5. Data distribution

| top | pdf |

Data are made available via a variety of mechanisms, such as ftp and the World Wide Web. Coordinate files, reports, software programs and other resources are available via the ftp server (ndbserver.rutgers.edu ). In addition to links to the ftp server, the web server provides a variety of methods for querying the NDB and accessing reports prepared from the database (http://ndbserver.rutgers.edu/ ).

24.2.5.1. Archives

| top | pdf |

The NDB archives, a section of the web site, contain a large variety of information and tables useful for researchers. Prepared reports about the structure identifiers, citations, cell dimensions and structure summaries are available and are sorted according to structure type. The dictionaries of standard geometries of nucleic acids as well as parameter files for X-PLOR (Brünger, 1992 ) are also available. The archives section links to the ftp server, providing coordinates for the asymmetric unit and biological units in PDB and mmCIF formats, structure-factor files, and coordinates for nucleic acid structures determined by NMR.

24.2.5.2. Atlas

| top | pdf |

A very popular and useful report is the NDB Atlas report page. An Atlas page contains summary, crystallographic and experimental information, a molecular view of the biological unit and a crystal-packing picture for a particular structure. Atlas pages are created directly from the NDB database (Fig. 24.2.5.1). The Atlas entries for all structures in the database are organized by structure type on the NDB web site.

Figure 24.2.5.1| top | pdf |

NDB Atlas page for URX035 (Scott et al., 1995 ) that highlights structural information that is contained in the database and provides images of the biological unit, asymmetric unit and crystal packing of the structure.

24.2.5.3. NDB searches

| top | pdf |

A web interface was designed to make the query capabilities of the NDB as widely accessible as possible. To highlight the special features of NDB, the interface operates in two modes. In the quick search/quick report mode, several items, including structure ID, author, classification and special features, can be limited either by entering text in a box or by selecting an option from the pull-down menu. Any combination of these items may be used to constrain the structure selection. If none are used, the entire database will be selected. After selecting `Execute Selection', the user will be presented with a list of structure IDs and descriptors that match the desired conditions. Several viewing options for each structure in this list are possible. These include retrieving the coordinate files in either mmCIF or PDB format, retrieving the coordinates for the biological unit, viewing the structure with RasMol (Sayle & Milner-White, 1995 ), or viewing an NDB Atlas page.

Preformatted quick reports can then be generated for the structures in this results list. The user selects a report from a list of 13 report options (Table 24.2.5.1), and the report is created automatically. Multiple reports can be easily generated. These reports are particularly convenient for being able to produce reports quickly based on derived features, such as torsion angles and base morphology (Fig. 24.2.5.2).

Table 24.2.5.1| top | pdf |
Quick reports available from the NDB

Report name	Contents
NDB status	Processing status information
Cell dimensions	Crystallographic cell constants
Primary citation	Primary bibliographic citations
Structure identifier	Identifiers, descriptor, coordinate availability
Sequence	Sequence
Nucleic acid sequence	Nucleic acid sequence only
Protein sequence	Protein sequence only
Refinement information	R factor, resolution and number of reflections used in refinement
Nucleic acid backbone torsions (NDB)	Sugar–phosphate backbone torsion angles using NDB residue numbers
Nucleic acid backbone torsions (PDB)	Sugar–phosphate backbone torsion angles using PDB residue numbers
Base-pair parameters (global)	Global base-pair parameters calculated using Curves 5.1 (Lavery & Sklenar, 1989 )
Base-pair step parameters (local)	Local base-pair step parameters calculated using Curves 5.1
Groove dimensions	Groove dimensions using Stoffer & Lavery definitions from Curves 5.1

Figure 24.2.5.2| top | pdf |

Examples of quick reports. Clockwise from top left: base-pair parameters [global; calculated using Curves 5.1 (Lavery & Sklenar, 1989 )] report for ribozyme structures; nucleic acid backbone torsions (NDB) report for ribozyme structures; structure identifier report for protein–DNA structures; citation report for protein–DNA structures.

In the full search/full report mode, it is possible to access most of the tables in the NDB to build more complex queries. Instead of limiting items that are listed on a single page, the user builds a search by selecting the tables and then the items that contain the desired features. These queries can use Boolean and logical operators to make complex queries.

After selecting structures using the full search, a variety of reports can be written. The report columns are selected from a variety of database tables, and then the full report is automatically generated. Multiple reports can be generated for the same group of selected structures; for example, reports on crystallization, base modification, or a combination of these reports can be generated for a particular group of structures.

24.2.5.4. Mirror sites

| top | pdf |

The NDB is based at Rutgers University (http://ndbserver.rutgers.edu/ ) and is currently mirrored at three other sites: the Institute of Cancer Research (ICR) in London, England (http://www.ndb.icr.ac.uk), the San Diego Supercomputer Center in San Diego, USA (http://ndb.sdsc.edu/NDB/) and the Structural Biology Centre in Tsukuba, Japan (http://ndbserver.nibh.go.jp/NDB/). These mirror sites are updated daily, are fully synchronous, and contain the ftp directories, the web site and the full database.

24.2.6. Outreach

| top | pdf |

The NDB has worked closely with the community of researchers to ensure that their needs are met. A newsletter is published electronically four times a year and provides information about the newest features of the system. Questions and very complex queries can be handled by the staff in response to user requests via e-mail to ndbadmin@ndbserver.rutgers.edu .

Acknowledgements

The NDB is funded by the National Science Foundation and the Department of Energy. Co-founders and collaborators are Wilma Olson, Rutgers University, and David Beveridge, Wesleyan University. We would like to thank Lisa Iype, Shri Jain, Xiang-Jun Lu and A. R. Srinivasan for their work on the project.

References

Allen, F. H., Bellard, S., Brice, M. D., Cartwright, B. A., Doubleday, A., Higgs, H., Hummelink, T., Hummelink-Peters, B. G., Kennard, O., Motherwell, W. D. S., Rodgers, J. R. & Watson, D. G. (1979). The Cambridge Crystallographic Data Centre: computer-based search, retrieval, analysis and display of information. Acta Cryst. B35, 2331–2339.Google Scholar

Berman, H. M., Olson, W. K., Beveridge, D. L., Westbrook, J., Gelbin, A., Demeny, T., Hsieh, S. H., Srinivasan, A. R. & Schneider, B. (1992). The Nucleic Acid Database – a comprehensive relational database of three-dimensional structures of nucleic acids. Biophys. J. 63, 751–759.Google Scholar

Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N. & Bourne, P. E. (2000). The Protein Data Bank. Nucleic Acids Res. 28, 235–242.Google Scholar

Bernstein, F. C., Koetzle, T. F., Williams, G. J., Meyer, E. E., Brice, M. D., Rodgers, J. R., Kennard, O., Shimanouchi, T. & Tasumi, M. (1977). Protein Data Bank: a computer-based archival file for macromolecular structures. J. Mol. Biol. 112, 535–542.Google Scholar

Bourne, P., Berman, H. M., Watenpaugh, K., Westbrook, J. D. & Fitzgerald, P. M. D. (1997). The macromolecular Crystallographic Information File (mmCIF). Methods Enzymol. 277, 571–590.Google Scholar

Brünger, A. T. (1992). X-PLOR. Version 3.1. A system for X-ray crystallography and NMR. Yale University Press, New Haven, CT, USA.Google Scholar

Clowney, L., Jain, S. C., Srinivasan, A. R., Westbrook, J., Olson, W. K. & Berman, H. M. (1996). Geometric parameters in nucleic acids: nitrogenous bases. J. Am. Chem. Soc. 118, 509–518.Google Scholar

Feng, Z., Hsieh, S.-H., Gelbin, A. & Westbrook, J. (1998). MAXIT: macromolecular exchange and input tool. NDB-120. Rutgers University, New Brunswick, NJ, USA.Google Scholar

Feng, Z., Westbrook, J. & Berman, H. M. (1998). NUCheck. NDB-407. Rutgers University, New Brunswick, NJ, USA.Google Scholar

Gelbin, A., Schneider, B., Clowney, L., Hsieh, S.-H., Olson, W. K. & Berman, H. M. (1996). Geometric parameters in nucleic acids: sugar and phosphate constituents. J. Am. Chem. Soc. 118, 519–528.Google Scholar

Grzeskowiak, K., Yanagi, K., Privé, G. G. & Dickerson, R. E. (1991). The structure of B-helical C-G-A-T-C-G-A-T-C-G and comparison with C-C-A-A-C-G-T-T-G-G: the effect of base pair reversal. J. Biol. Chem. 266, 8861–8883.Google Scholar

Lavery, R. & Sklenar, H. (1989). Defining the structure of irregular nucleic acids: conventions and principles. J. Biomol. Struct. Dyn. 6, 655–667.Google Scholar

Parkinson, G., Vojtechovsky, J., Clowney, L., Brünger, A. T. & Berman, H. M. (1996). New parameters for the refinement of nucleic acid-containing structures. Acta Cryst. D52, 57–64.Google Scholar

Sayle, R. & Milner-White, E. J. (1995). RasMol: biomolecular graphics for all. Trends Biochem. Sci. 20, 374.Google Scholar

Schneider, B., Neidle, S. & Berman, H. M. (1997). Conformations of the sugar–phosphate backbone in helical DNA crystal structures. Biopolymers, 42, 113–124.Google Scholar

Scott, W. G., Finch, J. T. & Klug, A. (1995). The crystal structure of an all-RNA hammerhead ribozyme: a proposed mechanism for RNA catalytic cleavage. Cell, 81, 991–1002.Google Scholar

Vaguine, A. A., Richelle, J. & Wodak, S. J. (1999). SFCHECK: a unified set of procedures for evaluating the quality of macromolecular structure-factor data and their agreement with the atomic model. Acta Cryst. D55, 191–205.Google Scholar

Westbrook, J. (1998). AutoDep input tool. NDB-406. Rutgers University, New Brunswick, NJ, USA.Google Scholar

International Tables for Crystallography (2006). Vol. F. ch. 24.2, pp. 657-662
https://doi.org/10.1107/97809553602060000719