Tables for
Volume F
Crystallography of biological macromolecules
Edited by M. G. Rossmann and E. Arnold

International Tables for Crystallography (2006). Vol. F. ch. 24.5, pp. 677-678   | 1 | 2 |

Section The database architecture

H. M. Berman,a* J. Westbrook,a Z. Feng,a G. Gilliland,b T. N. Bhat,b H. Weissig,c I. N. Shindyalovc and P. E. Bourned

aDepartment of Chemistry, Rutgers University, 610 Taylor Road, Piscataway, NJ 08854-8087, USA,bNational Institute of Standards and Technology, Biotechnology Division, 100 Bureau Drive, Gaithersburg, MD 20899, USA,cSan Diego Supercomputer Center, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0537, USA, and dDepartment of Pharmacology, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0537, USA
Correspondence e-mail: The database architecture

| top | pdf |

In recognition of the fact that no single architecture can fully express the information content of the PDB, an integrated system of heterogeneous databases and indices that store and organize the structural data has been created. At present there are five major components (Fig.[link]:

  • (1) The core relational database managed by Sybase (Sybase Inc., 1995[link]) provides the central physical storage for the primary experimental and coordinate data described in Table[link]. The core PDB relational database contains all deposited information in a tabular form that can be accessed across any number of structures.


    Figure | top | pdf |

    The integrated query interface to the PDB.

  • (2) The final curated data files (in PDB format) and data dictionaries are the archival data and are present as ASCII files in the ftp archive.

  • (3) The POM-based databases (Shindyalov & Bourne, 1997[link]) consist of indexed objects containing native (e.g. atomic coordinates) and derived properties (e.g. calculated secondary-structure assignments and property profiles). Some properties require no derivation, for example, B factors; others must be derived, for example, exposure of each amino-acid residue (Lee & Richards, 1971[link]) or Cα contact maps. Properties requiring significant computation time, such as structure neighbours (Shindyalov & Bourne, 1998[link]), are pre-calculated when the database is incremented to save considerable user-access time.

  • (4) The Biological Macromolecule Crystallization Database (BMCD; Gilliland, 1988[link]) is organized as a relational database within Sybase and contains three general categories of literature-derived information: macromolecular, crystal and summary data.

  • (5) The Netscape LDAP server is used to index the textual content of the PDB in a structured format and provides support for keyword searches.

In the current implementation, communication among databases has been accomplished using the common gateway interface (CGI). An integrated web interface dispatches a query to the appropriate database(s), which then executes the query. Each database returns the PDB identifiers that satisfy the query, and the CGI program integrates the results. Complex queries are performed by repeating the process and having the interface program perform the appropriate Boolean operation(s) on the collection of query results. A variety of output options are then available for use with the final list of selected structures.

The CGI approach (and in the future a CORBA-based approach) will permit other databases to be integrated into this system, for example, those containing extended data on different protein families. The same approach could also be applied to include NMR data found in the BMRB or data found in other community databases.


First citationSybase Inc. (1995). 70202–01–1100–01 SYBASE SQL server release 11.0. Emeryville, CA, USA.Google Scholar
First citationGilliland, G. L. (1988). A Biological Macromolecule Crystallization Database: a basis for a crystallization strategy. J. Cryst. Growth, 90, 51–59.Google Scholar
First citationLee, B. & Richards, F. M. (1971). The interpretation of protein structures: estimation of static accessibility. J. Mol. Biol. 55, 379–400.Google Scholar
First citationShindyalov, I. N. & Bourne, P. E. (1997). Protein data representation and query using optimized data decomposition. Comput. Appl. Biosci. 13, 487–496.Google Scholar
First citationShindyalov, I. N. & Bourne, P. E. (1998). Protein structure alignment by incremental combinatorial extension of the optimum path. Protein Eng. 11, 739–747.Google Scholar

to end of page
to top of page