Tables for
Volume F
Crystallography of biological macromolecules
Edited by M. G. Rossmann and E. Arnold

International Tables for Crystallography (2006). Vol. F. ch. 24.5, p. 678   | 1 | 2 |

Section Database queries

H. M. Berman,a* J. Westbrook,a Z. Feng,a G. Gilliland,b T. N. Bhat,b H. Weissig,c I. N. Shindyalovc and P. E. Bourned

aDepartment of Chemistry, Rutgers University, 610 Taylor Road, Piscataway, NJ 08854-8087, USA,bNational Institute of Standards and Technology, Biotechnology Division, 100 Bureau Drive, Gaithersburg, MD 20899, USA,cSan Diego Supercomputer Center, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0537, USA, and dDepartment of Pharmacology, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0537, USA
Correspondence e-mail: Database queries

| top | pdf |

Three distinct query interfaces are available for querying data within the PDB: Status Query ( ), SearchLite ( ) and SearchFields ( ). Table[link] summarizes the current query and analysis capabilities of the PDB. Fig.[link] illustrates how the various query options are organized.

Table| top | pdf |
Current query capabilities of the PDB

(a) Query – single or iterative

Free text – any word in the PDB
Specific data items – compound name, author, description, deposition date, resolution, source, citation, cell dimensions, experimental method, data-collection method, refinement method, broad structure type, ligand (using the PDB HET records)
Property pattern – sequence, secondary structure
Structure similarity – 3D comparison

(b) Results analysis – single structure

Synopsis/Snapshot/Atlas – compound name, sequence, chemical components, citation, space group, cell constants, crystallization conditions, refinement details, structure views
Quick report – compound name, author, description, deposition date, resolution, source, citation, cell dimensions, experimental method, data-collection method, refinement method, geometry features
Full report – Quick report results plus secondary structure, chemical components, solvent
Property profiles – sequence, secondary structure
Links – see Table[link]
RenderRasMol, Chime, QuickPDB (Java applet), VRML, Protein Explorer
Geometry – bond lengths, bond angles, dihedrals, close contacts, summary visual inspection

(c) Results analysis – multiple structure

Quick report – as above, but collated over multiple structures
Full report – as above, but collated over multiple structures
Structure neighbours – pairwise structure comparison

(d) Other query output options

mmCIF and PDB data files
Compressed files (gzip, tar, compressed)

Figure | top | pdf |

The various query options that are available for the PDB.

SearchLite, which provides a single form field for keyword searches, was introduced in February 1999. All textual information within the PDB files as well as dates and some experimental data are accessible via simple or structured queries. SearchFields, accessible since May 1999, is a customizable query form that allows searching over many different data items, including compound, citation authors, sequence (via a FASTA search; Pearson & Lipman, 1988[link]) and release or deposition dates.

Two user interfaces provide extensive information for results sets from SearchLite or SearchFields queries. The `Query result browser' interface allows access to some general information, access to more detailed information in tabular format and the possibility of downloading whole sets of data files for result sets consisting of multiple PDB entries. The `Structure explorer' interface provides information about individual structures as well as cross-links to many external resources for macromolecular structure data (Table[link]. Both interfaces are accessible to other data resources through the simple CGI application programmer interface (API) described at .

Table| top | pdf |
Static cross-links to other data resources currently provided by the PDB

ResourceInformation content
3Dee (Siddiqui & Barton, 1996[link]) Structural domain definitions
BMCD (Gilliland, 1988[link]) Crystallization information about biomacromolecules
CATH (Orengo et al., 1997[link]) Protein fold classification
CE (Shindyalov & Bourne, 1998[link]) Complete PDB and representative structure comparison and alignments
DSSP (Kabsch & Sander, 1983[link]) Secondary-structure classification
Enzyme Structures Database (Laskowski & Wallace, 1998[link]) Enzyme classifications and nomenclature
FSSP (Holm & Sander, 1998[link]) Structurally similar families
GRASS (Nayal et al., 1999[link]) Graphical representation and analysis
HSSP (Dodge et al., 1998[link]) Homology-derived secondary structures
Image (Sühnel, 1996[link]) Image library of biological macromolecules
MMDB (Hogue et al., 1996[link]) Database of three-dimensional structures
MEDLINE (National Library of Medicine, 1989[link]) Direct access to MEDLINE at NCBI
NDB (Berman et al., 1992[link]) Database of three-dimensional nucleic acid structures
PDBObs (Weissig et al., 1998[link]) Obsolete structures database
PDBSum (Laskowski et al., 1997[link]) Summary information about protein structures
SCOP (Murzin et al., 1995[link]) Structure classifications
STING (Neshich et al., 1998[link]) Simultaneous display of structural and sequence information
Tops (Westhead et al., 1998[link]) Protein structure motif comparisons topological diagrams
VAST (Gibrat et al., 1996[link]) Vector Alignment Search Tool (NCBI)
Whatcheck (Hooft et al., 1996[link]) Protein structure checks

Table[link] indicates that usage has climbed dramatically since the system was first introduced in February 1999. Currently the PDB receives approximately 90 000 web hits per day, or, on average, one query every second, seven days a week, 24 hours a day.

Table| top | pdf |
Web query statistics for the primary RCSB site ( )

MonthDaily averageMonthly totals
August 1999 63768 47675 34928 31781561 1477927 1976818
July 1999 75693 54427 38698 35652864 1687265 2346495
June 1999 33256 27054 11586 11164410 622264 764894
May 1999 26890 22085 12405 12463441 684650 833597
April 1999 21140 17099 12261 9925351 512990 634224
March 1999 8406 6911 6292 3560629 214255 260610
February 1999 2944 2433 2246 844536 68133 82453
January 1999 1563 1353 1153 92014 35202 40641


First citationPearson, W. R. & Lipman, D. J. (1988). Improved tools for biological sequence comparison. Proc. Natl Acad. Sci. USA, 24, 2444–2448.Google Scholar

to end of page
to top of page