International
Tables for Crystallography Volume F Crystallography of biological macromolecules Edited by M. G. Rossmann and E. Arnold © International Union of Crystallography 2006 |
International Tables for Crystallography (2006). Vol. F. ch. 23.2, pp. 579-587
https://doi.org/10.1107/97809553602060000715 Chapter 23.2. Protein–ligand interactionsProtein–ligand interactions are described and several of the unique interactions observed between proteins and other molecules are illustrated. The chapter covers protein–carbohydrate interactions, metal ions associated with proteins, protein–nucleic acid interactions, and sulfate and phosphate interactions. Keywords: carbohydrate-binding proteins; DNA; hydrogen bonding; nucleic acids; phosphate-binding protein; protein function; protein–carbohydrate recognition; protein–ligand interactions; protein–nucleic acid interactions; RNA; transfer RNA. |
There are currently over a thousand unique protein-bound ligands described in the Protein Data Bank, illustrating the enormous variety of small molecules with which proteins interact. These ligands serve as cofactors in protein-mediated reactions, substrates in these reactions, and elements that maintain or alter protein structure or macromolecular assembly. The specific binding of small molecules to proteins is a primary means by which living systems interact and exchange information with their environment. The atomic details of protein–ligand interactions are often quite similar to the intramolecular interactions observed within a protein molecule. Examples of all the various non-covalent interactions described in Parts 20
and 22
, such as hydrogen bonds, van der Waals forces and other electrostatic phenomena, are observed between proteins and their small-molecule ligands.
Through the diverse interactions observed between proteins and their ligands, a few fundamental patterns of recognition emerge. In general, ligand binding requires that the protein partially or fully sequesters the ligand from the solvent. This demands that the energy of interaction between the protein and the ligand must be strong enough to overcome the interactions between both species and the solvent as well as the translational and rotational entropy which is lost upon fixing the orientation of the ligand relative to the protein. The protein achieves this level of interaction by presenting a binding site that is complementary to the ligand both in shape and electrostatic functionality. Beyond this generalization, each ligand has its own unique and complicated story. Rather than attempt to summarize the enormous subject of protein–ligand interactions in a comprehensive manner, we will instead illustrate several of the unique interactions observed between proteins and other molecules.
The interaction between proteins and carbohydrates provides a prototypic example of how proteins specifically recognize small organic ligands. Protein-mediated recognition of carbohydrates is crucial in a diverse array of processes, including the transport, biosynthesis and storage of carbohydrates as an energy source, signal transduction through carbohydrate messengers, and cell–cell recognition and adhesion (Rademacher et al., 1988). Many aspects of protein–carbohydrate recognition are observed in the interactions of proteins with other small-molecule ligands. For example, carbohydrate recognition is largely conferred through a combination of hydrogen bonding and van der Waals interactions with the protein (Quiocho, 1986
). These interactions are generally presented in a binding site that is highly complementary to the target ligand in shape and functionality. Carbohydrate-binding proteins also occasionally employ ordered water molecules and bound metal ions to facilitate ligand recognition (Quiocho et al., 1989
; Vyas, 1991
; Weis & Drickamer, 1996
). There are also many examples of `induced fit' recognition of carbohydrates, where the binding of the ligand induces a conformational change in the protein. This phenomenon of ligand-induced conformational change was observed in one of the first ligand–protein interactions visualized by X-ray crystallography, namely the binding of a carbohydrate substrate to lysozyme (Blake et al., 1967
). Since these early studies, a wide variety of carbohydrate–protein structures have been determined, and a number of general themes have emerged from this continually active field (Vyas, 1991
). These themes are further applicable in the general study of protein–ligand recognition.
Two general classes of carbohydrate-binding modes have been observed in their complexes with proteins (Quiocho, 1986; Vyas, 1991
). The first group of proteins completely sequester the carbohydrate from the surrounding solvent. These proteins, including the periplasmic proteins and the catalytic site of glycogen phosphorylase, tend to have a high affinity for their ligand (Kd ≃ 10−6–10−7). The high affinity of these proteins can be attributed to the entropic gain of desolvating the protein and carbohydrate surfaces as well as the exceptionally high degree of functional complementarity in the ligand-binding site. The periplasmic proteins nearly saturate the hydrogen-bonding potential of their carbohydrate ligands (Quiocho, 1986
). The second group of proteins bind their ligands in shallow clefts in the solvent-exposed protein surface. This mode of binding, observed in the lectins, lysozyme and the storage site of glycogen phosphorylase, generally has a lower binding constant than the first class of binding proteins (Kd ≃ 10−3—10−6). Some members of this second class of proteins, such as the lectins, are able to increase the affinity for their ligands dramatically by clustering a number of low-affinity sites through the oligmerization of the polypeptides (Weis & Drickamer, 1996
). The specific arrangement of lectin oligomers allows them to discriminate between large cell-surface arrays of polysaccharides with high affinity and selectivity.
An early review of protein–carbohydrate interactions revealed several atomic level interactions that continue to appear ubiquitously in the structures of protein–carbohydrate complexes (Quiocho, 1986). A generic cyclic sugar in mono- or oligosaccharides appears to be recognized as a disc that displays two flat non-polar surfaces surrounded by a ring of polar hydroxyls. Proteins recognize these features by hydrogen bonding to the ring of polar hydroxyls while stacking flat aromatic side chains against the non-polar disc faces.
Cooperative hydrogen bonding, where the hydroxyl group of a carbohydrate participates as both a donor and acceptor of hydrogen bonds, is often observed in the direct interactions between proteins and carbohydrate ligands (Fig. 23.2.2.1). The sp3-hybridized oxygen atom of a carbohydrate hydroxyl may act as both an acceptor of two hydrogen bonds through the two lone pairs of electrons as well as a donor of a single hydrogen bond. Cooperative hydrogen bonding generally follows a simple pattern in which the carbohydrate hydroxyl accepts a hydrogen bond from a protein amide group while simultaneously donating a hydrogen bond to a protein carbonyl oxygen. Hydrogen bonding to protein hydroxyl groups is observed only infrequently. This pattern is thought to be a result, in part, of the entropic cost of fixing a freely rotating protein hydroxyl group while simultaneously fixing the ligand hydroxyl group. Amides and carbonyls are usually fixed in a planar geometry and thus do not require as much energy to compensate for their loss of entropy in ligand binding.
The vicinal hydroxyl groups of carbohydrates provide an ideal geometry for the formation of `bidentate' hydrogen bonds, where the pair of hydroxyls interacts with two functional groups of a single amino-acid side chain or the main-chain amide groups of two consecutive residues (Fig. 23.2.2.1). These interactions occur when the adjacent carbohydrate hydroxyls are either both equatorial, or one is equatorial and the other axial. The interatomic distance for the carbohydrate hydroxyl oxygens is ∼2.8 Å in these cases, allowing for a bidentate interaction with the planar side chains of aspartate, asparagine, glutamate, glutamine and arginine. Bidentate hydrogen bonds have not been observed for consecutive axial hydroxyls where the oxygen–oxygen distance increases to ∼3.7 Å.
Carbohydrates often present a disc-like face of non-polar aliphatic hydrogen atoms which proteins recognize through the use of aromatic side chains. The protein aromatic groups are `stacked' on the flat face of the carbohydrate, thus generating both specificity and binding energy through van der Waals interactions. Tryptophan, the aromatic amino acid with the largest surface area and highest electronegativity, is the most common side chain employed in van der Waals `stacking' with carbohydrates. The infrequent use of aliphatic groups in the binding of the non-polar carbohydrate faces suggests that the aromatic moieties are employed in a specific manner. The electron-rich electron clouds of the aromatic side chains may provide a strong electrostatic interaction with the aliphatic carbohydrate protons that could not be satisfied by protein aliphatic groups. The anionic character of aromatic side chains is observed in a number of protein–intramolecular (Chapter 22.2
) and protein–ligand interactions (see below).
Metal ions provide a number of important functions in their diverse and ubiquitous interactions with proteins. The most common function for a protein-bound metal ion is the stabilization and orientation of the protein tertiary structure through coordination to specific protein functional groups. In addition to this structural role, metal ions are also often directly involved in enzyme catalysis and protein function. Examples of these functions include redox reactions, the activation of chemical bonds and the binding of specific ligands. Myoglobin, the first protein structure determined by X-ray crystallography, specifically binds molecular oxygen through an iron ion of a haem cofactor. Myoglobin provides a prototypic example of a protein and a metal ion providing a unique and specific functionality through their combination.
A number of metals are relatively abundant and available in living systems (Table 23.2.3.1) (Glusker, 1991
). The most common ions include sodium, potassium, magnesium and calcium. Along with these ions, a large variety of trace metals are also found coordinated to proteins. The structures of protein complexes with some of these trace ions, including iron, zinc and copper, have been studied extensively for some time (Glusker, 1991
). More recently, the structures of protein complexes with more unusual ions, such as nickel, vanadium and tungsten, have been determined (Volbeda et al., 1996
).
|
Specificity in the interactions between proteins and metal ions is conferred through each ion's preference for the coordinating atoms and the geometry of the binding site. All four of the more common metals, i.e. sodium, potassium, magnesium and calcium, are classified as `hard' metals, referring to the polarizability of the electron cloud of the ion. The nucleus of a hard metal has a relatively tight hold on the surrounding electrons. These ions lack easily excitable unshared electrons and have a low polarizability. The interactions between these metals and their ligands tend to have the character of ionic interactions rather than the more covalent nature preferred by the `soft' metals. In general, the hard metals prefer to coordinate with hard acids, such as the oxygen atoms of hydroxyls, carbonyls and carboxyls.
The soft metals have a high polarizability, large ionic radius and several unshared valence electrons. They generally prefer to coordinate with soft acids, such as the thiol and thiol ether groups of cysteine and methionine. The loosely held valence electrons of soft metals tend to favour partially covalent π-bonding with their coordinated ligands. These outer-shell electrons can be donated to the empty outer orbitals of the ligand atom. The partially covalent nature of these bonds yields more stable complexes than the ionic complexes of the hard metals. This partial covalent bond also polarizes the ligand coordinated to the metal and can thus activate adjacent atoms to nucleophilic attack.
A large number of the transition metals, including zinc and iron, form ions that have intermediate polarizability with regard to hard and soft metals. These ions mainly prefer nitrogen ligands like the imidazole side chain of histidine or the central nitrogens of the haem cofactor.
The geometry of the metal-binding site in a protein depends on a combination of the radial size of the ion as well as the polarizability of the metal. The number of coordinating ligands around the metal is primarily correlated with the relative size of the ion, where as many anions as possible are packed around the cationic metal without leaving any cavities (Orgel, 1966). This leads to a relatively simple correlation between the ratio of the radii of the cation and the anion (rcation/ranion) with the coordination number. Beyond this simple geometric constraint, the coordination number is also influenced by the repulsion between the closely packed anion ligands. This repulsion can be tempered by the distortions in the cation's electron cloud, leading to a dependency between the coordination number and the polarizability of the metal ion. Table 23.2.3.1
gives the most common coordination numbers and geometries for the listed metal ions. For a more comprehensive description of possible coordination geometries, see Glusker (1991)
.
A short example of the diversity of metal functions in protein complexes is found in a comparison between the calcium-binding proteins calmodulin and staphyloccocal nuclease. Calmodulin functions in signal transduction by binding to a wide variety of proteins in a calcium-dependent manner. In the absence of calcium, calmodulin adopts a conformation where two loosely folded domains are connected by a flexible α-helix analogous to two balls tied together by a string. In the presence of Ca2+, each of the two domains of calmodulin binds to a single metal ion. The binding of Ca2+ to the two calmodulin domains induces a large conformational change in the protein, which confers a high affinity for peptide ligands. Crystallographic studies show that the two calcium-bound domains form a clamp that closes on the target peptide ligand (Meador et al., 1995). Thus, in this case, the metal ion plays an indirect role as a structural element in the protein function.
In the case of staphylococcal nuclease, calcium binding appears to play a more direct role in the catalytic function of the protein. A Ca2+ ion binds at the active site and coordinates with protein side chains, water molecules and the substrate phosphate group. The addition of calcium affects the nuclease reaction both in the binding of the substrate and directly in the catalytic step. Although calcium increases the Km of the nucleic acid substrate, this effect can be reproduced with a large number of other metal ions (Tucker et al., 1979). The effect on catalysis, however, is specific to Ca2+ ions. In a proposed mechanism, Ca2+ directly contributes to catalysis by activating a water-derived hydroxide ion for nucleophilic attack on the phosphorus atom of the nucleic acid backbone (Cotton et al., 1979
).
DNA provides one of the more compelling protein `ligands' for biophysical study, as the sequence-specific binding of proteins to the DNA double helix mediates the interaction between the environment surrounding the living cell and the information `programmed' into the cell within its genome. A classic example of such a process is the response of the bacteria Escherichia coli to the nutrients in the surrounding media through the regulation of gene expression. A simple case of this interaction is found in the biosynthesis of the amino acid tryptophan. The transcription of the genes necessary for the synthesis of tryptophan is suppressed when tryptophan is present in the environment. This process is mediated by the tryptophan-dependent sequence-specific binding of the trp repressor protein to the trp operon within the genes encoding the metabolic enzymes (Joachimiak et al., 1983). In the absence of tryptophan, the affinity of the aporepressor for the trp operon is dramatically reduced. Thus, when tryptophan is not available in the environment, transcription of the biosynthetic genes proceeds. In mammalian cells, the analogous process is observed in the activation of gene expression through hormones, cytokines and other stimuli.
Although DNA has often been considered to be a long, nearly featureless cylindrical double helix, proteins have evolved with exquisite specificity for their cognate DNA sequences. This apparent contradiction can be reconciled with the acknowledgement of two recently appreciated properties of DNA (Harrington & Winicov, 1994). First, the local structure of DNA is actually highly variable and dependent on the specific sequence of the base pairs in the helical ladder. Second, the DNA double helix is a relatively soft structure that is easily deformed into concerted bends, kinks and other distortions. DNA-binding proteins thus recognize their cognate sequences both by utilizing the unique local structure of the double helix and by inducing distortions into the helix which facilitate recognition.
The most intuitive features of the double helix that are important in sequence-specific recognition are the unique surfaces presented by the bases in the helix grooves. DNA is primarily found in a B-form helix that presents a wide, accessible major groove and a deep, narrow minor groove. An analysis of the arrangement of hydrogen-bonding functional groups presented by DNA bases (Fig. 23.2.4.1) suggests that the sequence-specific recognition of the DNA helix is best facilitated through the major groove, where each of the four possible base-pair combinations present unique hydrogen-bonding patterns (Steitz, 1990
). The majority of sequence-specific DNA-binding proteins of known structure appear to utilize this direct readout of the major groove by inserting a portion of an α-helix, a two-stranded β-hairpin, or even a peptide coil which presents complementary hydrogen-bonding arrangements with the DNA bases (Pabo & Sauer, 1992
; Steitz, 1990
). The narrow surface of the minor groove presents some characteristic hydrogen-bonding patterns: however, the absolute identity of each base pair is ambiguously represented in these patterns (Fig. 23.2.4.1)
. The similar position of hydrogen-bonding groups in the minor groove would make it hard to distinguish AT base pairs from TA base pairs and GC base pairs from CG base pairs. Although there are proteins that recognize DNA through the minor groove, such as the TATA-box binding protein, the recognition of their target is completed through dramatic distortion of the DNA helix through intercalation (see below).
α-Helices are the most frequently observed structural motif for recognition in the major groove of DNA (Pabo & Sauer, 1992). The overall shape and dimensions of the α-helix are geometrically suited for binding in the major groove of a B-DNA helix (Fig. 23.2.4.2)
. The exact orientations of helices in various protein–DNA complexes are quite variable. Most helices bind in the major groove at an angle of approximately 30 (15)° from the plane normal to the DNA helical axis (Fig. 23.2.4.3)
. However, the numerous variants to this rule would include the trp repressor/operator complex, where only the N-terminal end of the `recognition' helix is inserted into the major groove (Otwinowski et al., 1988a
,b
). Interactions observed between these inserted elements and the DNA bases include the common direct hydrogen bond between the protein side chain and base, the less common hydrogen bond between the protein backbone and base, indirect but specific hydrogen bonding through water molecules, and hydrophobic interactions.
There appears to be no simple correlation between the primary sequence of the peptide segments which make specific base contacts and the DNA sequence that those segments recognize (Pabo & Sauer, 1992; Steitz, 1990
). Examples of every polar protein side chain participating in specific hydrogen bonds with DNA bases have been observed, but each amino acid does not show any preference for any one particular base. What is observed is that conserved residues within families of DNA-binding proteins tend to make conserved base-specific interactions in DNA–protein complexes. Strikingly, this subset of interactions which are conserved within protein families include cooperative hydrogen bonding reminiscent of the pairs of hydrogen bonds often observed in carbohydrate–protein complexes. These interactions, which include the pairing of arginine with guanine and glutamine or asparagine with adenine, were predicted early on by Seeman et al. (1976)
.
Although the elements of protein structure in direct contact with the DNA bases play a prominent role in sequence specificity, these elements are not sufficient to impart the specificity of the DNA-binding protein. This statement is supported by the variety of orientations in which the `recognition' helices bind to the major groove. The structural context of the recognition elements and the overall docking of the protein to the DNA helix play as important a role in specificity as the direct base interactions.
The contacts between the protein and the ribose–phosphate backbone of the DNA appear to be one of the more important aspects of the `indirect readout' of the DNA sequence (Pabo & Sauer, 1992). On average, more than half of the interactions between protein and DNA in complex structures involve the backbone of the DNA helix. Thus, the sheer number of interactions suggests that these contacts serve an important function in recognition. Although several of the protein–DNA-backbone contacts observed involve salt bridges between the phosphates and basic protein side chains, these interactions are not as highly represented as one might expect. This could be a result of the high degree of flexibility inherent in the long side chains of arginine and lysine. Instead, examples of every basic and neutral residue and occasionally even acidic residue with some hydrogen-bonding potential interacting with the phosphate backbone have been observed. These contacts may contribute to specificity through two mechanisms. First, they can establish the exact orientation of the base-specific contacts relative to the `rungs' in the phosphate backbone. Second, they may read the base sequence indirectly through sequence-specific backbone distortions or flexibility. There are numerous examples of DNA–protein complexes with highly distorted DNA helices. There is also evidence that certain DNA sequences inherently confer bends within the B-form helix. Thus, it is conceivable that protein interactions with the DNA backbone may confer specificity by selecting for a specific distorted conformation of the helix.
The most dramatic distortion of the DNA helix has been observed in DNA–protein complexes where the protein induces a kink or bend through the intercalation of the DNA helix at the minor groove (Werner et al., 1996a,b
). Intercalation involves the insertion of a hydrophobic protein side chain into the helix, disrupting the stacking of two adjacent base pairs, and, in some cases, the side chain itself then stacks with one of the base pairs. Examples of this mode of binding include the complexes of the TATA-box binding protein (TBP), the PurR repressor and the human oncogene ETS1 with their cognate DNA partners (Werner et al., 1996a
,b
). The ETS1–DNA complex provides the only current example of complete intercalation of the DNA extending from the minor groove to the major groove. A tryptophan side chain extends into the helix from the minor groove and stacks with one of the displaced base pairs. The remaining base pair contacts the ring system of the tryptophan edge in forming a pseudo-hydrogen bond between the indole hydrogens and the π-rings of the DNA bases. In ETS1, the deformation of the DNA helix resulting from protein intercalation results in the kinking of the helical axis from 45° to about 60°.
Examples of protein intercalation of the DNA helix from the major groove are found in proteins, such as the methyltransferases, that perform chemistry on the bases of the DNA. To perform their enzymatic function, these proteins must extract the target base from the DNA helix and `flip' the base out into the enzyme active site (Cheng, 1995). The resulting void in the DNA is then filled by protein side chains that partially satisfy the hydrogen-bonding and van der Waals interactions that were broken when the target base was flipped. Although there are only a few known structures of DNA–protein complexes with extra-helical bases, base flipping is thought to be a relatively common feature of DNA-modifying enzymes.
There have been a few reports of single-stranded DNA–protein complex structures, all of which involve the sequence-nonspecific recognition of DNA. In the binding of a tetranucleotide to the exonuclease active site of the DNA polymerase I Klenow fragment (Freemont et al., 1988), extensive hydrogen-bonding interactions between the sugar–phosphate backbone and the protein are observed. This provides the most intuitive mechanism for sequence-nonspecific nucleic acid binding, where the protein simply recognizes the phosphate backbone of a single-stranded coil. The protein also appears to form a few hydrophobic interactions with the DNA bases; however, these interactions, which include the partial intercalation between two bases, are thought to be nonspecific.
The structure of replication protein A complexed with single-stranded DNA does not exhibit the intuitive nonspecific mechanism of recognition found in the Klenow fragment (Bochkarev et al., 1997). In this structure, the DNA is extended with its bases splayed out over the surface of the protein. The bases form several pairwise stacking interactions that are interrupted by intercalating protein side chains. Contrary to the sequence-nonspecific nature of recognition, numerous hydrogen bonds are found between the protein and the bases of the DNA strand. These base-dependent contacts require that the protein–DNA interactions must be flexible and plastic in order to accommodate different base sequences.
Although RNA and DNA are chemically similar, RNA presents a much greater variety of shapes and surfaces compared to the relatively simple B-form helix of DNA. Generally single-stranded, RNA often forms secondary structuresdriven by the base pairing of complementary stretches of sequence within the same strand. The formation of base-paired regions can result in stem loops, bulges and helices which can further assemble into more complicated tertiary structures, such as that observed for transfer RNAs. Protein-mediated recognition of RNA often depends as much on the three-dimensional structure presented by these secondary structures as on the specific identity of the base sequence.
Very little information is currently available on the structural details of protein–RNA interactions (Nagai, 1996). Only a handful of protein–RNA complex structures have been determined. These fall into three basic categories, depending on the secondary structure of the RNA: four tRNA–protein complexes, two stem-loop–protein complexes and a capped single-stranded RNA–protein complex.
In the four known structures of tRNA bound to their aminoacyl tRNA synthetases (Cusack et al., 1996a,b
; Goldgur et al., 1997
; Rould et al., 1991
), the effects of RNA's preference for A-form helices on recognition are immediately apparent. The proteins make numerous contacts in the shallow and exposed minor grooves of the RNA helices. This contrasts with the extensive use of the major groove in the recognition of B-form DNA helices. Beyond this generalization, the details of tRNA recognition differ in each specific case. Comparison of the protein-bound tRNA to the structure of free tRNA reveals that the proteins tend to distort the RNA conformation and partially unwind the helices near the anti-codon loop. In one case, namely the structure of glutamyl-tRNA synthetase (Rould et al., 1991
), the final base pair near the acceptor stem of the tRNA is broken, and the CCA acceptor makes a dramatic hairpin turn into the enzyme active site.
One fascinating observation in viewing the structures of RNA-binding proteins, even in the absence of RNA, is that aside from the tRNA-binding synthetases, they all appear to have evolved from or towards a very similar general fold (Burd & Dreyfuss, 1994). This fold, exemplified by the RNP domain found in numerous RNA-binding proteins, consists of a β-sheet surrounded on one side by α-helices and solvent-exposed on the opposing face. This general folding architecture is found in RNP domains, ribosome proteins, K-homologous domains (KH), double-stranded RNA-binding domains and cold shock proteins. Although each of these subsets of RNA-binding domains has a different topology and most probably bind to RNA with different surfaces, they all appear to have this alpha–beta–solvent architecture.
Two proteins with this architecture have been co-crystallized with their specific RNA stem-loop ligands (Nagai et al., 1995; van den Worm et al., 1998
). In both cases, the loop of the RNA binds to the open face of the β-sheet where solvent-exposed aromatic amino-acid side chains stack with the extrahelical bases of the RNA. Unpaired bases from the RNA also form numerous specific hydrogen bonds with protein side chains and polar backbone groups, imparting sequence specificity in the interaction. These structures suggest that the flat, open face of a β-sheet provides a good surface for RNA binding, where the extrahelical bases can make extensive and specific contacts with the protein.
There is a single example of a single-stranded RNA–protein complex which is sequence-nonspecific. The structure of the vaccinia RNA methyltransferase VP39 bound to a 5′m7G-capped RNA hexamer reveals a mechanism of nonspecific recognition reminiscent of the Klenow fragment–DNA tetramer complex (Hodel et al., 1998). The RNA forms two short single-stranded helices of three bases each. The first of these helices binds in the active site of VP39 solely through hydrogen bonds between the protein and the ribose–phosphate backbone. The bases of the RNA strand stack together as trimers, but do not form any interactions with the protein (Fig. 23.2.4.4)
. Like the Klenow–DNA complex, this observation suggests an intuitive mechanism for sequence-nonspecific nucleic acid binding, where the single-stranded RNA forms short transient helices driven by intramolecular stacking interactions. The protein then recognizes and stabilizes the helical backbone conformation formed by this transient stacking without interacting with the bases themselves.
The complex of VP39 with capped RNA also illustrates a final example of the diversity of protein–ligand interactions in the specific recognition of the 7-methylguanosine cap. When guanosine is methylated at the N7 position, a positive charge is introduced to the π-ring system of the base. Eukaryotic cells utilize the methylation of a guanosine base at the N7 position as a tag or cap for the 5′ end of messenger RNA. The m7G5′ppp mRNA cap is specifically recognized in the splicing of the first intron in nascent transcripts, in the transport of mRNA through the nuclear envelope and in the translation of the message by the ribosome (Varani, 1997). Two structures of specific m7G binding proteins are now known: VP39 and the ribosomal cap-binding protein IF-4E, (Hodel et al., 1997
; Marcotrigiano et al., 1997
). Each structure offers clues as to how the proteins can discriminate between the charged methylated m7G base and the unmodified guanosine base. The m7G base is stacked between aromatic protein side chains and hydrogen bonded to acidic protein residues (Fig. 23.2.4.5)
. One long-held hypothesis is that IF-4E, with dual tryptophan residues, binds specifically to the positively charged form of the base through a charge-transfer complex (Ueda, Iyo, Doi, Inoue & Ishida, 1991
). The formation of a charge-transfer complex is evident in small-molecule studies and spectroscopic studies with IF-4E (Ueda, Iyo, Doi, Inoue, Ishida et al., 1991
). However, VP39 performs the same discrimination with the much less electronegative phenylalanine and tyrosine side chains (Hodel et al., 1997
). So far, no charge-transfer complex has been observed in VP39.
The recognition of charged methylated bases is important not only in mRNA processing, but also in the repair and recognition of DNA damaged by alkylating carcinogens. The mechanism by which the charged m7G base is recognized is probably similar to how other positively charged bases, such as 3-methyladenosine, O2-methylcytosine and O2-methylthymidine, are recognized. In fact, the E. coli DNA repair enzyme, AlkA, will catalyse the glycolysis of all of these bases (Lindahl, 1982). The structure of AlkA is known, but only in the absence of a substrate (Labahn et al., 1996
). In this structure, a number of solvent-exposed tryptophan residues are found at the putative active site. This observation suggests that AlkA may recognize positively charged bases through an aromatic `sandwich', much like that found in IF-4E and VP39.
Novel features of molecular recognition and electrostatic interactions of these two tetrahedral oxyanions have emerged from our crystallographic and functional studies of the phosphate-binding protein (PBP) and sulfate-binding protein (SBP), which serve as extremely specific initial receptors for ATP-binding cassette (ABC)-type active transport or permease in bacterial cells. The complexes of these proteins have Kd values in the low µM range. Although phosphate and sulfate are structurally similar, at physiological pH PBP and SBP exhibit no overlap in specificity (Medveczky & Rosenberg, 1971; Pardee, 1966
; Jacobson & Quiocho, 1988
). This stringent specificity prevents one tetrahedral oxyanion nutrient from becoming an inhibitor of transport for the other. The specificity of the PBP-dependent phosphate transport system is also shared by other phosphate transport systems in eukaryotic cells and across brush borders and into mitochondria.
As described below, discrimination between anions is based solely on the protonation state of the ligand. Sulfate, a conjugate base of a strong acid, is completely ionized at pH values above 3, whereas phosphate, a conjugate base of a weak acid, remains protonated up to pH 13.
The structure of the PBP–phosphate complex was initially determined at 1.7 Å resolution (Luecke & Quiocho, 1990). The resolution has been pushed to an ultra high resolution of 0.98 Å, the first reported for a protein with a molecular weight as high as 34 kDa with a bound ligand (Wang et al., 1997
). The bound phosphate is completely desolvated and sequestered in the protein cleft between two domains. It makes 12 hydrogen bonds with the proteins (11 with donor groups and one with an acceptor group), as well as one salt link to an Arg that is in turn salt-linked to an Asp residue (Fig. 23.2.5.1)
. The distances of the 12 hydrogen bonds between phosphate and PBP obtained from the ultra high resolution structure range from 2.432 to 2.906 Å (Wang et al., 1997
). The Asp56 carboxylate, the lone acceptor group, plays two key roles in conferring the exquisite specificity of PBP. It recognizes, by way of the hydrogen bond, a proton on the phosphate and presumably disallows, by charge repulsion, the binding of a fully ionized sulfate dianion (Luecke & Quiocho, 1990
).
The SBP binding-site cleft is also tailor-made for sulfate (Pflugrath & Quiocho, 1985). In keeping with the stringent specificity of SBP for fully ionized tetrahedral oxyanions (Pardee, 1966
; Jacobson & Quiocho, 1988
), the bound sulfate, which is also completely dehydrated and buried, is held in place by seven hydrogen bonds made entirely with donor groups from uncharged polar residues of the protein (Fig. 23.2.5.2)
(Pflugrath & Quiocho, 1985
). The absence of a hydrogen-bond acceptor group accounts for the inability of SBP to bind phosphate. Interestingly, the absence of a salt link and the formation of five fewer hydrogen bonds with the bound sulfate (Fig. 23.2.5.2b)
than with the bound phosphate (Fig. 23.2.5.1b)
do not make the affinity of the SBP–sulfate complex any weaker than that of the PBP–phosphate complex. In fact, the sulfate binds 10–20 times more tightly to SBP (Pardee, 1966
; Jacobson & Quiocho, 1988
). Also, the hydration energies of both anions are likely to be similar.
The ability of PBP and SBP to differentiate each oxyanion ligand through the presence or absence of proton(s) is an extremely high level of sophistication in molecular recognition. The importance of complete hydrogen bonding in recognition of buried ligands is powerfully demonstrated in PBP and SBP. As the sulfate is fully ionized (i.e. possesses no hydrogen at physiological pH), repulsion occurs at Asp56 of PBP specifically for this dianion. On the other hand, SBP is unable to bind phosphate because it contains no hydrogen-bond acceptor in the binding site. Significantly, despite the potential for a large number of matched hydrogen-bonding pairs, a single mismatched hydrogen bond (e.g. a fully ionized sulfate providing no proton for interaction with Asp56 of PBP and no acceptor group in SBP for a phosphate proton) represents a binding energy barrier of 6–7 kcal mol−1 (1 kcal mol−1 = 4.184 kJ mol−1).
A novel finding of further paramount importance and wide implication is how the isolated charges of the protein-bound phosphate and sulfate are stabilized. No counter-charged residues or cations are associated with the sulfate completely buried in SBP. Although a salt link involving Arg135 is formed with the phosphate bound to PBP, it is shared with an Asp residue (Fig. 23.2.5.1b). Moreover, site-directed mutagenesis studies indicate that phosphate binding is quite insensitive to modulation of the salt link (Yao et al., 1996
). These findings are a powerful demonstration of how a protein is able to stabilize the charges by means other than salt links. Experimental and computational studies indicate that local dipoles, including the hydrogen-bonding groups and the backbone NH groups from the first turn of helices, immediately surrounding the sulfate and phosphate are responsible for charge stabilization (Pflugrath & Quiocho, 1985
; Quiocho et al., 1987
; Åqvist et al., 1991
; He & Quiocho, 1993
; Yao et al., 1996
; Ledvina et al., 1996
). Helix macrodipoles play little or no role in charge stabilization of the anions. The same principle of charge stabilization by local dipoles also applies for the following buried uncompensated ionic groups: Arg151 of the arabinose-binding protein (Quiocho et al., 1987
), the zwitterionic leucine ligand bound to the leucine/isoleucine/valine-binding protein (Quiocho et al., 1987
), the potassium in the pore of the potassium channel (Doyle et al., 1998
) and Arg56 of synaptobrevin-II in a SNARE complex (Sutton et al., 1998
).
The ultra high resolution refined structure of the PBP–phosphate complex is the first to show structurally the formation of an extremely short hydrogen bond (2.432 Å) between the Asp56 carboxylate of PBP and phosphate. Although this short hydrogen bond is within the proposed range of low-barrier hydrogen bonds with estimated energies of 12–24 kcal mol−1 (Hibbert & Emsley, 1990), its contribution to phosphate binding affinity has been assessed to be no better than that of a normal hydrogen bond (Wang et al., 1997
). Thus, a unique role for short hydrogen bonds in biological systems, such as in enzyme catalysis (Gerlt & Gassman, 1993
; Cleland & Kreevoy, 1994
), remains controversial.
23.2.5.3. Non-complementary negative electrostatic surface potential of protein sites specific for anions
The presence of an uncompensated negatively charged Asp56 is unusual for an anion-binding site, as observed in PBP. In fact, a related discovery of profound ramification is that the binding-cleft region of PBP has an intense negative electrostatic surface potential (Fig. 23.2.5.3a) (Ledvina et al., 1996
). Non-complementarity between the surface potential of a binding region and an anion ligand is not unique to PBP. We have reported similar findings for SBP, a DNA-binding protein, and, even more dramatically, for the redox protein flavodoxin (Fig. 23.2.5.3b)
(Ledvina et al., 1996
). Evidently, for proteins such as these, which rely on hydrogen-bonding interactions with only uncharged polar residues for anion binding and electrostatic balance, a non-complementary surface potential is not a barrier to binding. This conclusion is supported by very recent fast kinetic studies of binding of phosphate to PBP and the effect of ionic strength on binding (Ledvina et al., 1998
).
Acknowledgements
FAQ is an HHMI Investigator. The work carried out in his laboratory is supported in part by grants from NIH and the Welch Foundation.
References





























































