International
Tables for Crystallography Volume F Crystallography of biological macromolecules Edited by M. G. Rossmann and E. Arnold © International Union of Crystallography 2006 |
International Tables for Crystallography (2006). Vol. F. ch. 24.2, pp. 657-662
https://doi.org/10.1107/97809553602060000719 Chapter 24.2. The Nucleic Acid Database (NDB)
aThe Nucleic Acid Database Project, Department of Chemistry, Rutgers University, 610 Taylor Road, Piscataway, NJ 08854-8077, USA The Nucleic Acid Database is a repository of three-dimensional structural information about nucleic acids and serves as a resource for research and education. The information content and database features of the NDB are described. Keywords: data processing; databases; Nucleic Acid Database. |
The Nucleic Acid Database (NDB) (Berman et al., 1992) was established in 1991 as a resource for specialists in the field of nucleic acid structure. Its purpose was to gather all of the structural information about nucleic acids that had been obtained from X-ray crystallographic experiments and to organize them in such a way that it would be easy to retrieve the coordinates, the information about the experimental conditions used to derive these coordinates, and the structural information that could be derived from these coordinates. Since many NDB users are not crystallographers, the information provided by the database has been presented in such a way as to maximize its utility for various types of modelling and structure prediction.
Since the NDB was founded, many new technologies have presented new challenges and opportunities. The emergence of the World Wide Web has allowed for the creative and powerful dissemination and collection of data and information. The development of a standard interchange format for handling crystallographic data, the macromolecular Crystallographic Information File (mmCIF; Bourne et al., 1997), has made it possible to ensure the integrity and consistency of the data in the archive. The NDB has used these resources to provide both a relational database and an archive of information to a global community.
Structures available in the NDB include RNA and DNA oligonucleotides with two or more bases either alone or complexed with ligands, natural nucleic acids such as tRNA, and protein–nucleic acid complexes. The archive stores both primary and derived information about the structures. The primary data include the crystallographic coordinate data, structure factors and information about the experiments used to determine the structures, such as crystallization information, data collection and refinement statistics. Derived information, such as valence geometry, torsion angles, base-morphology parameters and intermolecular contacts, is calculated and stored in the database. Database entries are further annotated to include information about the overall structural features, including conformational classes, special structural features, biological functions and crystal-packing classifications. Table 24.2.2.1 summarizes the information content of the NDB.
|
Data processing includes data collection, integrity checking and validation of the entries. Once processing is completed, the data are entered into the database. This is accomplished using the integrated system that is illustrated in Fig. 24.2.3.1.
Flow chart showing the organization of the Nucleic Acid Database Project. The core of this integrated system is the database. |
Structures are entered electronically into the NDB after they have been deposited directly by the experimentalist or by the NDB annotators, who scan the literature and the Protein Data Bank (PDB; Bernstein et al., 1977; Berman et al., 2000). The coordinate data may be deposited in any PDB format or in mmCIF format. The entries are transformed into mmCIF format and then annotated using a web-based tool (Westbrook, 1998). This tool operates on top of the mmCIF dictionary (Bourne et al., 1997) and is used to incorporate experimental information to create a fully populated mmCIF format file. In the next stage of data processing, a program called MAXIT (Macromolecular Exchange and Input Tool; Feng, Hsieh et al., 1998) checks and corrects atom numbering and ordering as well as the correspondence between the PDB SEQRES record and the residue names in the coordinate files. Once these integrity checks are completed, the structures are validated using a variety of programs.
NUCheck (Feng, Westbrook & Berman, 1998) verifies valence geometry, torsion angles, intermolecular contacts and the chiral centres of the sugars and phosphates. The dictionaries used for checking the structures were developed by the NDB project from analyses (Clowney et al., 1996, Gelbin et al., 1996) of high-resolution small-molecule structures from the Cambridge Structural Database (CSD; Allen et al., 1979). The torsion-angle ranges were derived from an analysis of high-resolution nucleic acid structures (Schneider et al., 1997). One important outgrowth of these validation projects was the creation of the force constants and restraints that are now in common use for crystallographic refinement of nucleic acid structures (Parkinson et al., 1996). The program SFCHECK (Vaguine et al., 1999) is used to validate the model against the structure-factor data. The R factor and resolution are verified and the residue-based features are examined with this program. Once an entry has been processed satisfactorily, it is entered into the database.
The core of the NDB project is a relational database in which all of the primary and derived data items are organized into tables. At present, there are over 90 tables in the NDB, with each table containing five to 20 data items. These tables contain both experimental and derived information. Example tables include: the citation table, which contains all the items that are present in literature references; the cell_dimension table, which contains all items related to crystal data; and the refine_parameters table, which contains the items that describe the refinement statistics.
Interaction with the database is a two-step process (Fig. 24.2.4.1). In the first step, the user defines the selection criteria by combining different database items. As an example, the user could select all B-DNA structures whose resolution is better than 2.0 Å, whose R factor is better than 0.17, and which were determined by the authors Dickerson, Kennard, or Rich. Once the structures that meet the constraint criteria have been selected, reports may be written using a combination of table items. For any set of chosen structures, a large variety of reports may be created. For the example set of structures given above, a crystal-data report or a backbone torsion-angle report can be easily generated, or the user could write a report that lists the twist values for all CG steps together with statistics, including mean, median and range of values. The constraints used for the reports do not have to be the same as those used to select the structures. Some examples of reports from the NDB are given in Fig. 24.2.4.2.
Flow chart demonstrating the two steps involved in searching the NDB: structure selection and report generation. |
Data are made available via a variety of mechanisms, such as ftp and the World Wide Web. Coordinate files, reports, software programs and other resources are available via the ftp server (ndbserver.rutgers.edu ). In addition to links to the ftp server, the web server provides a variety of methods for querying the NDB and accessing reports prepared from the database (http://ndbserver.rutgers.edu/ ).
The NDB archives, a section of the web site, contain a large variety of information and tables useful for researchers. Prepared reports about the structure identifiers, citations, cell dimensions and structure summaries are available and are sorted according to structure type. The dictionaries of standard geometries of nucleic acids as well as parameter files for X-PLOR (Brünger, 1992) are also available. The archives section links to the ftp server, providing coordinates for the asymmetric unit and biological units in PDB and mmCIF formats, structure-factor files, and coordinates for nucleic acid structures determined by NMR.
A very popular and useful report is the NDB Atlas report page. An Atlas page contains summary, crystallographic and experimental information, a molecular view of the biological unit and a crystal-packing picture for a particular structure. Atlas pages are created directly from the NDB database (Fig. 24.2.5.1). The Atlas entries for all structures in the database are organized by structure type on the NDB web site.
A web interface was designed to make the query capabilities of the NDB as widely accessible as possible. To highlight the special features of NDB, the interface operates in two modes. In the quick search/quick report mode, several items, including structure ID, author, classification and special features, can be limited either by entering text in a box or by selecting an option from the pull-down menu. Any combination of these items may be used to constrain the structure selection. If none are used, the entire database will be selected. After selecting `Execute Selection', the user will be presented with a list of structure IDs and descriptors that match the desired conditions. Several viewing options for each structure in this list are possible. These include retrieving the coordinate files in either mmCIF or PDB format, retrieving the coordinates for the biological unit, viewing the structure with RasMol (Sayle & Milner-White, 1995), or viewing an NDB Atlas page.
Preformatted quick reports can then be generated for the structures in this results list. The user selects a report from a list of 13 report options (Table 24.2.5.1), and the report is created automatically. Multiple reports can be easily generated. These reports are particularly convenient for being able to produce reports quickly based on derived features, such as torsion angles and base morphology (Fig. 24.2.5.2).
|
In the full search/full report mode, it is possible to access most of the tables in the NDB to build more complex queries. Instead of limiting items that are listed on a single page, the user builds a search by selecting the tables and then the items that contain the desired features. These queries can use Boolean and logical operators to make complex queries.
After selecting structures using the full search, a variety of reports can be written. The report columns are selected from a variety of database tables, and then the full report is automatically generated. Multiple reports can be generated for the same group of selected structures; for example, reports on crystallization, base modification, or a combination of these reports can be generated for a particular group of structures.
The NDB is based at Rutgers University (http://ndbserver.rutgers.edu/ ) and is currently mirrored at three other sites: the Institute of Cancer Research (ICR) in London, England (http://www.ndb.icr.ac.uk), the San Diego Supercomputer Center in San Diego, USA (http://ndb.sdsc.edu/NDB/) and the Structural Biology Centre in Tsukuba, Japan (http://ndbserver.nibh.go.jp/NDB/). These mirror sites are updated daily, are fully synchronous, and contain the ftp directories, the web site and the full database.
The NDB has worked closely with the community of researchers to ensure that their needs are met. A newsletter is published electronically four times a year and provides information about the newest features of the system. Questions and very complex queries can be handled by the staff in response to user requests via e-mail to ndbadmin@ndbserver.rutgers.edu .
Acknowledgements
The NDB is funded by the National Science Foundation and the Department of Energy. Co-founders and collaborators are Wilma Olson, Rutgers University, and David Beveridge, Wesleyan University. We would like to thank Lisa Iype, Shri Jain, Xiang-Jun Lu and A. R. Srinivasan for their work on the project.
References
Allen, F. H., Bellard, S., Brice, M. D., Cartwright, B. A., Doubleday, A., Higgs, H., Hummelink, T., Hummelink-Peters, B. G., Kennard, O., Motherwell, W. D. S., Rodgers, J. R. & Watson, D. G. (1979). The Cambridge Crystallographic Data Centre: computer-based search, retrieval, analysis and display of information. Acta Cryst. B35, 2331–2339.Google ScholarBerman, H. M., Olson, W. K., Beveridge, D. L., Westbrook, J., Gelbin, A., Demeny, T., Hsieh, S. H., Srinivasan, A. R. & Schneider, B. (1992). The Nucleic Acid Database – a comprehensive relational database of three-dimensional structures of nucleic acids. Biophys. J. 63, 751–759.Google Scholar
Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N. & Bourne, P. E. (2000). The Protein Data Bank. Nucleic Acids Res. 28, 235–242.Google Scholar
Bernstein, F. C., Koetzle, T. F., Williams, G. J., Meyer, E. E., Brice, M. D., Rodgers, J. R., Kennard, O., Shimanouchi, T. & Tasumi, M. (1977). Protein Data Bank: a computer-based archival file for macromolecular structures. J. Mol. Biol. 112, 535–542.Google Scholar
Bourne, P., Berman, H. M., Watenpaugh, K., Westbrook, J. D. & Fitzgerald, P. M. D. (1997). The macromolecular Crystallographic Information File (mmCIF). Methods Enzymol. 277, 571–590.Google Scholar
Brünger, A. T. (1992). X-PLOR. Version 3.1. A system for X-ray crystallography and NMR. Yale University Press, New Haven, CT, USA.Google Scholar
Clowney, L., Jain, S. C., Srinivasan, A. R., Westbrook, J., Olson, W. K. & Berman, H. M. (1996). Geometric parameters in nucleic acids: nitrogenous bases. J. Am. Chem. Soc. 118, 509–518.Google Scholar
Feng, Z., Hsieh, S.-H., Gelbin, A. & Westbrook, J. (1998). MAXIT: macromolecular exchange and input tool. NDB-120. Rutgers University, New Brunswick, NJ, USA.Google Scholar
Feng, Z., Westbrook, J. & Berman, H. M. (1998). NUCheck. NDB-407. Rutgers University, New Brunswick, NJ, USA.Google Scholar
Gelbin, A., Schneider, B., Clowney, L., Hsieh, S.-H., Olson, W. K. & Berman, H. M. (1996). Geometric parameters in nucleic acids: sugar and phosphate constituents. J. Am. Chem. Soc. 118, 519–528.Google Scholar
Grzeskowiak, K., Yanagi, K., Privé, G. G. & Dickerson, R. E. (1991). The structure of B-helical C-G-A-T-C-G-A-T-C-G and comparison with C-C-A-A-C-G-T-T-G-G: the effect of base pair reversal. J. Biol. Chem. 266, 8861–8883.Google Scholar
Lavery, R. & Sklenar, H. (1989). Defining the structure of irregular nucleic acids: conventions and principles. J. Biomol. Struct. Dyn. 6, 655–667.Google Scholar
Parkinson, G., Vojtechovsky, J., Clowney, L., Brünger, A. T. & Berman, H. M. (1996). New parameters for the refinement of nucleic acid-containing structures. Acta Cryst. D52, 57–64.Google Scholar
Sayle, R. & Milner-White, E. J. (1995). RasMol: biomolecular graphics for all. Trends Biochem. Sci. 20, 374.Google Scholar
Schneider, B., Neidle, S. & Berman, H. M. (1997). Conformations of the sugar–phosphate backbone in helical DNA crystal structures. Biopolymers, 42, 113–124.Google Scholar
Scott, W. G., Finch, J. T. & Klug, A. (1995). The crystal structure of an all-RNA hammerhead ribozyme: a proposed mechanism for RNA catalytic cleavage. Cell, 81, 991–1002.Google Scholar
Vaguine, A. A., Richelle, J. & Wodak, S. J. (1999). SFCHECK: a unified set of procedures for evaluating the quality of macromolecular structure-factor data and their agreement with the atomic model. Acta Cryst. D55, 191–205.Google Scholar
Westbrook, J. (1998). AutoDep input tool. NDB-406. Rutgers University, New Brunswick, NJ, USA.Google Scholar