International
Tables for
Crystallography
Volume G
Definition and exchange of crystallographic data
Edited by S. R. Hall and B. McMahon

International Tables for Crystallography (2006). Vol. G. ch. 5.3, pp. 523-525

Section 5.3.8.3.  mmLib : a Python toolkit for bioinformatics applications

B. McMahona*

a International Union of Crystallography, 5 Abbey Square, Chester CH1 2HU, England
Correspondence e-mail: bm@iucr.org

5.3.8.3. mmLib: a Python toolkit for bioinformatics applications

| top | pdf |

While the libraries developed for use within the Protein Data Bank provide powerful functionality, their very size and complexity make them inappropriate for some applications. Indeed, considerable effort may be needed to compile the C++ code on nonstandard platforms. The mmLib toolkit (Painter & Merritt, 2004[link]) addresses this by supplying a library of object-oriented routines implemented in Python (van Rossum, 1991[link]) that are designed to integrate with existing or new applications in an easy way.

The objective of mmLib is to build a support platform to handle the increasingly rich data about macromolecular structure available to structural biologists. Not only do applications need to be able to handle atomic positions and build appropriate three-dimensional structure representations; but links to and integration with information on sequence, homologous structures, and biochemical, genetic and medical form and function are also demanded from individual program systems. Since much of these data are available from external databases in a variety of formats, mmLib will not be restricted to the handling of files in a single format. Its initial release provides support for mmCIF, for the PDB format files that historically have been used for representation of macromolecular structures (Westbrook & Fitzgerald, 2003[link]) and for the MTZ format used by the CCP4 program suite (Collaborative Computational Project, Number 4, 1994[link]).

Table 5.3.8.1[link] lists the main modules in the current release. mmLib.mmCIF and mmLib.PDB are read/write parsers for mmCIF and PDB format files, respectively, which handle file input and output in these formats, and provide support for inspection or modification of such file formats. They are typically used in conjunction with the mmLib.FileLoader component to populate the mmLib.Structure internal representation of the macromolecular structure. The high-level abstraction of such functionality allows for very succinct programmatic constructs. Fig. 5.3.8.3[link] illustrates this with a program snippet that (apart from the necessary system calls for file management) achieves the conversion of an mmCIF input file to a PDB format representation. This is sufficiently robust and lightweight to act as an input filter to software already designed for handling PDB format files.

Table 5.3.8.1 | top | pdf |
The modules provided by the mmLib toolkit

mmLib.mmCIF mmCIF parser
mmLib.PDB PDB format parser
mmLib.Library Base chemical library
mmLib.Extensions.CCP4Library Data retrieval from CCP4 monomer library
mmLib.Elements Chemical data for elements
mmLib.AminoAcids Chemical data for amino acids
mmLib.NucleicAcids Chemical data for nucleic acids
mmLib.Structure Macromolecular structure model
mmLib.GLViewer OpenGL visualizer
[Figure 5.3.8.3]

Figure 5.3.8.3 | top | pdf |

A snippet of code illustrating mmCIF/PDB file format conversion with the mmLib toolkit.

mmLib.Structure represents the internal representation of a molecular structure and is implemented as an object hierarchy with four basic object classes: Structure, Chain, Fragment and Atom. The Fragment class has subclasses AminoAcidResidue and Nucleic­Acid­Residue. In order to build a complete representation of a structure, the toolkit may need to load data from an input mmCIF or PDB format file, and also from standard data sets of properties of individual monomers and chemical elements; these standard libraries of chemical properties are provided by the mmLib.Library module. The core mmLib source includes a limited library of such chemical properties (accessible through the subclasses mmLib.Elements, mmLib.AminoAcids and mmLib.NucleicAcids) and also provides support for the extensive CCP4 monomer library through the mmLib.Extensions.CCP4Library. The naming of this class expresses the intention that other standard data sources should be made accessible in the same way.

The CCP4 monomer library is in fact included with the software as a directory tree of small files in mmCIF format, which are loaded into the Structure object through the normal use of the toolkit's mmCIF parser.

mmLib.GLViewer is a module provided to support visualization programs using the OpenGL graphics environment. Although it does not by itself provide a stand-alone viewer, it can be incorporated into many common graphics application building environments. An example molecular viewer, mmView, is provided with the distribution as an example of an application using the GTK graphical user interface, a popular toolkit in Linux.

References

First citation Collaborative Computational Project, Number 4 (1994). The CCP4 suite: programs for protein crystallography. Acta Cryst. D50, 760–763.Google Scholar
First citation Painter, J. & Merritt, E. A. (2004). mmLib Python toolkit for manipulating annotated structural models of biological macromolecules. J. Appl. Cryst. 37, 174–178.Google Scholar
First citation Rossum, G. van (1991). Python programming language. http://www.python.org .Google Scholar
First citation Westbrook, J. & Fitzgerald, P. (2003). The PDB format, mmCIF formats and other data formats. Structural bioinformatics, edited by P. E. Bourne & H. Weissig, pp. 161–179. Hoboken, NJ: John Wiley & Sons, Inc.Google Scholar








































to end of page
to top of page