International
Tables for Crystallography Volume F Crystallography of biological macromolecules Edited by M. G. Rossmann and E. Arnold © International Union of Crystallography 2006 |
International Tables for Crystallography (2006). Vol. F. ch. 24.5, pp. 677-678
Section 24.5.3. The PDB database resource
H. M. Berman,a* J. Westbrook,a Z. Feng,a G. Gilliland,b T. N. Bhat,b H. Weissig,c I. N. Shindyalovc and P. E. Bourned
a
Department of Chemistry, Rutgers University, 610 Taylor Road, Piscataway, NJ 08854-8087, USA,bNational Institute of Standards and Technology, Biotechnology Division, 100 Bureau Drive, Gaithersburg, MD 20899, USA,cSan Diego Supercomputer Center, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0537, USA, and dDepartment of Pharmacology, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0537, USA |
In recognition of the fact that no single architecture can fully express the information content of the PDB, an integrated system of heterogeneous databases and indices that store and organize the structural data has been created. At present there are five major components (Fig. 24.5.3.1):
In the current implementation, communication among databases has been accomplished using the common gateway interface (CGI). An integrated web interface dispatches a query to the appropriate database(s), which then executes the query. Each database returns the PDB identifiers that satisfy the query, and the CGI program integrates the results. Complex queries are performed by repeating the process and having the interface program perform the appropriate Boolean operation(s) on the collection of query results. A variety of output options are then available for use with the final list of selected structures.
The CGI approach (and in the future a CORBA-based approach) will permit other databases to be integrated into this system, for example, those containing extended data on different protein families. The same approach could also be applied to include NMR data found in the BMRB or data found in other community databases.
Three distinct query interfaces are available for querying data within the PDB: Status Query (http://www.rcsb.org/pdb/status.html ), SearchLite (http://www.rcsb.org/pdb/searchlite.html ) and SearchFields (http://www.rcsb.org/pdb/cgi/queryForm.cgi ). Table 24.5.3.1 summarizes the current query and analysis capabilities of the PDB. Fig. 24.5.3.2 illustrates how the various query options are organized.
|
SearchLite , which provides a single form field for keyword searches, was introduced in February 1999. All textual information within the PDB files as well as dates and some experimental data are accessible via simple or structured queries. SearchFields, accessible since May 1999, is a customizable query form that allows searching over many different data items, including compound, citation authors, sequence (via a FASTA search; Pearson & Lipman, 1988) and release or deposition dates.
Two user interfaces provide extensive information for results sets from SearchLite or SearchFields queries. The `Query result browser' interface allows access to some general information, access to more detailed information in tabular format and the possibility of downloading whole sets of data files for result sets consisting of multiple PDB entries. The `Structure explorer' interface provides information about individual structures as well as cross-links to many external resources for macromolecular structure data (Table 24.5.3.2). Both interfaces are accessible to other data resources through the simple CGI application programmer interface (API) described at http://www.rcsb.org/pdb/linking.html .
|
Table 24.5.3.3 indicates that usage has climbed dramatically since the system was first introduced in February 1999. Currently the PDB receives approximately 90 000 web hits per day, or, on average, one query every second, seven days a week, 24 hours a day.
|
References
Sybase Inc. (1995). 70202–01–1100–01 SYBASE SQL server release 11.0. Emeryville, CA, USA.Google ScholarGilliland, G. L. (1988). A Biological Macromolecule Crystallization Database: a basis for a crystallization strategy. J. Cryst. Growth, 90, 51–59.Google Scholar
Lee, B. & Richards, F. M. (1971). The interpretation of protein structures: estimation of static accessibility. J. Mol. Biol. 55, 379–400.Google Scholar
Pearson, W. R. & Lipman, D. J. (1988). Improved tools for biological sequence comparison. Proc. Natl Acad. Sci. USA, 24, 2444–2448.Google Scholar
Shindyalov, I. N. & Bourne, P. E. (1997). Protein data representation and query using optimized data decomposition. Comput. Appl. Biosci. 13, 487–496.Google Scholar
Shindyalov, I. N. & Bourne, P. E. (1998). Protein structure alignment by incremental combinatorial extension of the optimum path. Protein Eng. 11, 739–747.Google Scholar