Computational Methods for Studying Transmembrane alpha-Helices
Membrane proteins play key roles in biological systems as pores, ion channels and receptors. Because they are so important in intracellular communication and coordination, they may serve as good drug targets - altering the function of signaling proteins may help correct defects in signaling that are the root of many diseases. Understanding their structure is critical for our understanding of signal transduction and may aid in efforts of structure-based drug design.
It is estimated that 20-30% of most genomes encode for membrane proteins [Engelman, personal communication], however, of the thousands of solved protein structures, only a handful are membrane proteins. This discrepancy is attributed to the relative difficultly of studying membrane proteins. Because they are hydrophobic, they do not easily dissolve and are difficult to crystallize. Thus, the conventional methods of structure determination, solution NMR and x-ray crystallography, are not easily applied to membrane proteins. For this reason, theoretical and computational tools may be especially useful for studying membrane proteins. In this paper, I will briefly review the various ways in which computational efforts may aid in membrane protein structure prediction. Methods used today include predicting secondary structure from molecular sequence data and predicting tertiary structure with secondary structural information. Computational methods are also used to study the correlation between structure and function and model protein dynamics.
Many believe that the protein folding problem may be easier to solve for membrane proteins [Adams and Brunger, 1997] because of the restraints imposed by the lipid environment. In the hydrophobic environment of the lipids, we would expect hydrogen bonds to be satisfied, as their insertion into the membrane would be highly unfavorable if hydrogen bonds were not re-established after bilayer entry. Thus, well-defined secondary structure is expected in membrane proteins to keep the hydrogen bonds requirements of the peptide backbone [Engelman et al. 1986]. Many groups of membrane proteins, including ion channels, toxins, antibiotics, and receptors, have alpha-helical secondary structure [Dieckmann and DeGrado, 1997]. Helices are believed to be the most common motif in membrane proteins [Cohen and Parry, 1990].
It appears that this motif is relatively easy to spot in a sequence; transmembrane alpha-helices have been successfully predicted from molecular sequence data alone [Engelman et al, 1986]. The Chou-Fasman rules are not usually good for predicting helices in the bilayer [Wallace et al., 1986] because these rules are based on data from soluble protein structures, but transmembrane sequences can be recognized by analyzing the hydropathy of the amino acids in the sequence. Some believe that the most important factor for determining helicity in membranes is hydrophobicity, and that packing considerations are less important in the bilayer than in globular proteins [Shun-Cheng and Deber, 1994]. Transmembrane alpha-helical sequences are characterized by a largely, if not completely, hydrophobic stretch of around 20 amino acids, however, predictions of which sequences fold into helices may vary slightly depending on which polarity scale is chosen. Engelman et al stress that many scales are based on side chains partitioning between an aqueous environment and a protein interior, which is a very different case than partitioning between aqueous and lipid environments. The dielectric constant in the lipid environment is low and constant, whereas that in a protein interior is difficult to predict and is highly variable [Engelman et al, 1986]. Engelman et al developed the Goldman, Engelman, Steitz (GES) hydrophobicity scale, based on experimental and theoretical considerations about how well each amino acid would enter the lipid bilayer from an aqueous environment [Engelman et al, 1982]. In their treatment, they note that serine and threonine are polar, but that both can satisfy their hydrogen bonds donors and acceptors by hydrogen bonding to the backbone carbonyls of the peptide, making partitioning in the membrane more favorable than otherwise predicted [Engelman et al, 1986]. In addition, they recognize that, while lysine is largely hydrophobic, its pka is quite high. They conclude that it may not move into the bilayer well, and their polarity scale reflects this. GES scale seems to predict known membrane helices better than other scales do, especially those that are known to contain some polar residues [Engelman et al, 1986]. Their method also accounts for the possibility of polar groups interacting in the bilayer, which would increase the chance of recognizing a transmembrane sequence containing a few polar groups.
After a scale is chosen, the total hydrophobicity of a sequence segment can be determined, and based on the total polarity, its helical propensity can be predicted. This method has been used extensively to identify transmembrane helices. After hydrophobicity scale, another factor to consider when analyzing molecular sequence data for transmembrane helices is how long a segment should be analyzed. This length is termed the window length, and an optimal window length would accurately describe the number of amino acids in a transmembrane alpha-helix. The length of such a helix is determined by the length of the lipid bilayer the helix spans and the angle of the helix with respect to the membrane normal [Engelman et al, 1986]. If the helix is very tilted, we would expect it to be longer (and have more amino acids) than a helix parallel to the membrane normal for a given bilayer width. Engelman et al estimate that 21 residues are required for a helix to stretch across an average bilayer while Cohen and Parry estimate 23 [Cohen and Parry, 1990]. Analysis of solved membrane proteins reveals that this length ranges of 14 to 36 amino acids [Bowie, 1997a]. Because of these discrepancies, and because lipids are dynamic and the width of the bilayer can vary greatly, it is recommended that many window lengths be tried [Engelman et al, 1986] to predict transmembrane helices successfully.
As discussed above, the secondary structure can often be determined from a sequence. Some tertiary folds can also be predicted from a sequence alone. Cohen and Parry believe that some tertiary folds, in particular, coiled coils, can also be predicted from the sequence alone [Cohen and Parry, 1990]. Coiled coils are coils of alpha-helices. Crick first identified the coiled coil theme in 1953 [Crick, 1953]. They are characterized by a repeating pattern of seven amino acids, where each position in a heptad is ordered by the letters from a to g, and apolar residues occupy positions a and d. Residues at positions a and d are expected to be at the interface of two helices. At this interface, the side chains from different helices are expected to pack so that the ‘knobs’ of one helix fit into the ‘holes’ of another [Crick, 1953]. Because transmembrane sequences are almost exclusively hydrophobic, they must display the heptad repeat [Cohen and Parry, 1990]. Cohen and Parry postulate that the a and d positions would be more conserved than the other positions that would face the bilayer, and the analysis of the bacterial photosynthetic reaction center, a solved membrane protein structure, supports this hypothesis [Cohen and Parry, 1990]. The fact that the residues on the helix-helix interface are more conserved gives evidence for the commonly held view that helices pack more closely with each other than with the lipids [Dieckmann and DeGrado, 1997].
The advantage of analyzing sequences for heptad repeats is that amphipathic helices, as well as totally hydrophobic sequences, can be identified [Cohen and Parry, 1990]. Hydropathy profiles are usually not very good at finding transmembrane amphipathic helices. To date, neither approach gives very detailed information about tertiary structure of membrane proteins, although algorithms have been written to identify different oligomeric states of coiled coils [reviewed in Lupas, 1997]. While the heptad repeat analysis may identify a bundle of helices, the specific packing interactions of more than two helices would be difficult to predict with molecular sequence data alone.
Another method for predicting tertiary structure involves determining local secondary structure and then finding how the various secondary structural elements may pack. This method assumes that proteins first fold into local and independent secondary structures that then interact with each other to form tertiary structure. Popot and Engelman have proposed exactly this theory for membrane protein folding; they believe that many membrane proteins first form helices across the lipid bilayer and then pack with each other to form bundles, or oligomers, of helices [Popot and Engelman, 1990]. Given this assumption, once we know secondary structure, we can begin to estimate tertiary structure. We can make educated guesses about local secondary structure by studying molecular sequence data, as described above. In addition, low resolution spectroscopies, such as circular dichroism (CD) and infrared spectroscopy (IR), can be used to experimentally determine local secondary structure. [Adams and Brunger, 1997]. CD and IR can give estimates for the amount of the protein in extended beta structure or in alpha helices. Compared to high resolution structural information, large secondary folds are relatively easy to identify in membrane proteins.
Many theoretical attempts have been made to predict how secondary structural elements, especially alpha-helices, may pack with each other [Crick, 1953; Richmond and Richards, 1978; Chothia et al, 1981]. Chothia et al describe how the residues of a helix form ridges and grooves [Chothia et al, 1981]. The i,i+1, i,i+3, and i,i+4 residues each form ridges on the surface of a helix. The absence of side chains creates grooves. Chothia et al propose that helices maximize their interactions with each other when the ridges pack into grooves at the interface of two helices. They expect the most common interactions between helices will involve the ridges of the i,i+4 residues and that the least common will consist of the ridges formed by the i,i+1 side chains.
Attempts have been made to use computational methods to find optimal or probable ways to pack secondary structural elements together, both in membrane proteins and in soluble ones, and many of these efforts have been directed at helix bundles. Brunger et al have developed a method that finds the most probable conformations of helical bundles [Adams et al, 1995; Adams et al, 1996]. Their search algorithm places two alpha-helices a user-specified distance apart with a specified tilt angle and incrementally rotates the two alpha-helices relative to each other about their helical axes. At each increment, the dimer is relaxed. This relaxation involves molecular dynamics, simulated annealing, and energy minimization, and its purpose is to find an energetically favored structure of the dimer at or near that rotation. Helical constraints and interhelical constraints are applied during the relaxation to maintain helical structure and to keep the two helices together. The angles about the helix axes are not fixed during the relaxation, so that during the relaxations, the dimer may migrate towards a more favorable conformation. The initial rotation angles of the helices about their axes often do not equal their final rotation angles because certain dimeric conformations are favored over others. Those that are most favored will be the most populated by relaxed structures. The most populated conformations will be the most probable structures for the dimer. In this manner, the search algorithm identifies the helix-helix interfaces that are most likely. The most likely structures are not always those of the lowest energies [Adams et al, 1995]. It appears more sound to look for convergence of structures in space rather than compare energies of the relaxed structures [Engelman, personal communication]. The relaxed structures are too close in energy to compare, and these differences in energy can be attributed to small differences between two structures. This search algorithm, however, relies on population differences, which are more significant.
Using this method, Brunger et al have successfully predicted the interface of glycophorin A, a transmembrane homodimer from human erythrocytes [Adams et al, 1996]. The likely structures identified by the search algorithm were evaluated using mutagenesis data. Their prediction was confirmed by NMR data [MacKenzie et al, 1997]. The success with glycophorin A suggests that the combination of low resolution spectroscopy, computational searches, and mutagenesis experiments may be powerful in studying membrane protein structure [Adams et al, 1997].
Higher order oligomers can also be predicted using this method, provided that the protein is homooligomeric. Because the helices are identical in amino acid composition, it is assumed that the structure, and thus the helix-helix interactions, will be symmetric. The symmetry reduces the computational efforts involved in such an endeavor; it takes a good deal of time to search a helical dimer alone. For a bundle of more than two, the search algorithm is performed only on a dimer, likely interfaces are identified, and then, assuming these interfaces to be the ones in the higher order oligomer, the program predicts probable structures for the entire protein. This method has been used to propose a model for a five helix bundle, phospholamban, a calcium channel found in the heart sarcoplasm [Adams et al, 1995].
One disadvantage of the helical search is that the lipids are not included during the relaxation. The hydrophobic environment of the bilayer is modeled only by a low dielectric constant. Although the helices are predicted to pack closer with each other than with the lipids, the details of environment may effect the tertiary structure of the protein. It would be difficult and very expensive computationally to add the lipids into the calculations. However, it seems to be less of a problem to exclude the lipids when modeling a membrane protein than to leave out water molecules when modeling a soluble protein, although including waters is computationally very costly. Thus, computational methods are, at present, better in some ways for predicting membrane proteins than for soluble ones.
In order to make educated predictions, however, one needs information with which to guide them. There is much more information available about soluble protein structure than about membrane protein structure because very few membrane protein structures have been solved. From the information available on membrane proteins, it is clear that transmembrane helices may differ significantly from helices in soluble proteins. Transmembrane helices may display a unique curvature [Cohen and Parry, 1990]. Theoretical considerations suggest that some amino acids have different helical propensities in the bilayer and in solution. For example, isoleucine has a higher helical propensity in the bilayer than outside the bilayer [Shun-Cheng and Deber, 1994].
As indicated by the previous examples, there have been many efforts to find structural motifs in membrane proteins. These motifs serve as guidelines for future predictions. Extracting information from known protein structures is yet another way in which computational efforts aid in the understanding of membrane proteins. Studying known structures has, for example, revealed that aromatic residues are often in the bilayer interface, possibly anchoring the transmembrane helix in the bilayer [Pawagi et al, 1994]. This information is useful for determining where transmembrane helices may start and stop in a given sequence of amino acids, as is the average length of a transmembrane helix, which is around 26 amino acids [Bowie, 1997a]. The structure of cytochrome c oxidase of bovine mitochondria, a proton pump, has recently been solved [Tsukihara et al, 1995], and because the protein is so large, much information about membrane proteins has recently come from studying the protein’s 28 transmembrane helices. This information has refined our understanding of membrane proteins. For instance, the interior of cytochrome c oxidase is more hydrophobic than the part exposed to lipid [Wallin et al, 1997]. This information suggests that the inside-out model for membrane proteins, which states that membrane proteins are inside-out soluble proteins with hydrophillic on their inside and hydrophobic on the outside [Engelman and Zacci, 1980], is not entirely correct. Studying membrane proteins has also given a better understanding of their geometries. The most favored helix packing angles in soluble proteins are about -35°, but analysis of known structures shows that the most frequent packing angle in membrane proteins is +20° [Bowie, 1997a]. Solved membrane protein structures have also shown that helical propensities are different in the membrane. Glycine and proline, which are thought to be helix-breakers in soluble proteins, occur in the transmembrane helices of cytochrome c oxidase [Tsukihara et al, 1995]. Moreover, glycine may play an important role in helix-helix interactions in the bilayer: glycine is often found at the helix-helix interface and at the point of closest approach between two helices in cytochrome c oxidase [Javadapour and Smith, unpublished data]. More research in this area will increase knowledge about membrane protein folding even further so that structures can be predicted with greater success.
Solving structures is not the final destination, however. Once a good structure is obtained, the next step is to find out how the protein does its task. Computational tools often make this possible. We can try to determine ligand and drug binding sites using a known structure and docking programs that find complementary surfaces between a ligand and a structure [Kuntz, 1992]. We can also use molecular dynamics simulations to explore protein motions. This was done to evaluate different oxygen and proton pathways in cytochrome c oxidase [Hofacker and Schulten, 1998], and is a common method for studying ion channel conductances.
Computational methods have been, and will continue to be, one of the most useful tools for studying membrane protein structure and function. Scanning algorithms can predict membrane protein secondary and tertiary structure. Molecular modeling, molecular dynamics, simulated annealing, and energy minimization, while often used to refine structures, may help us predict them. The same methods can help us study a protein’s function, given its structure. While theoretical calculations provide many hypotheses, it is important to remember that they do not provide definitive answers about biological systems. Computational tools will be most powerful when designed with hard biological data and used as a guide for experimentation, and not in absence of it.
References
Adams, P.D., Arkin, I.T., Engelman, D.M., and Brunger, A.T. Computational searching and mutagenesis suggest a structure for the pentameric transmembrane domain of phospholamban. Nature Structural Biology. 1995. 2:154-162.
Adams, P.D., Brunger, A.T. Towards prediction of membrane protein structure. In Membrane Protein Assembly. Edited by von Heijne G. Austin, TX: RG Landers Co; 1997: 251-265.
Adams, P.D., Engelman, D.M. and Brunger, A.T. Improved prediction for the structure of the dimeric transmembrane domain of glycophorin obtained through global searching. Proteins: Structure, Function, and Genetics. 1996. 26:257-261.
Bowie, J.U. Helix packing in membrane proteins. J. Mol. Biol. 1997. 272: 780-789.
Bowie, J.U. Helix packing angle preferences. Nature Structural Biology. 1997. 4:915-917.
Chothia, C., Levitt, M., and Richardson, D. Helix to helix packing in proteins. J. Mol. Biol. 1981. 145:215-250.
Cohen, C. and Parry, D.A.D. alpha-Helical coiled coils - a widespread motif in proteins. TIBS. 1986. 11:245-248.
Cohen, C. and Parry, D.A.D. alpha-Helical coiled coils and bundles: how to design an alpha-helical protein. Proteins: Structure, Function, and Genetics. 1990. 7:1-15.
Cohen, C. and Parry, D.A.D. alpha-helical coiled coils: more facts and better predictions. Science. 1994. 263:488-489.
Crick, F.H.C. The packing of alpha-helices: simple coiled-coils. Acta Crystallogr. 1953. 6:689-697.
Dieckmann, G.R. and DeGrado, W.F. Modeling transmembrane helical oligomers. Current Opinion in Structural Biology. 1997. 7:486-494.
Engelman, D.M. Helix-helix interactions in membranes: a new target for drugs? Structure and Function of 7TM Receptors, Alfred Benzon Symposium 39:122-137. 1996.
Engelman, D.M. Crossing the hydrophobic barrier: insertion of membrane proteins. Science. 1996. 274:1850-1851.
Engelman, D.M., Goldman, A., and Steitz, T.A. Methods Enzymol. 1982. 88:81.
Engelman, D.M., Steitz, T.A., and Goldman, A. Identifying nonpolar transbilayer helices in amino acid sequences of membrane proteins. Ann. Rev. Biophys. Biophys. Chem. 1986. 15:321-53.
Engelman, D.M. and Zaccai, G. Bacteriorhodopsin is an inside-out protein. Proc. Natl. Acad. Sci. USA. 1980. 77:5894-5898.
Fleming, K.G., Ackerman, A.L., and Engelman, D.M. The effect of point mutation on the free energy of transmembrane alpha-helix dimerization. J. Mol. Biol. 1997. 272:266-275.
Hofacker, I. and Schulten, K. Oxygen and proton pathways in cytochrome c oxidase. Proteins: Structure, Function, and Genetics. 1998. 30:100-107.
Lemmon, M.A. and Engelman, D.M. Specificity and promiscuity in membrane helix interactions. FEBS Letters. 1994. 346:17-20.
Lemmon, M.A., Treutlein, H.R., Adams, P.D., Brunger, A.T., and Engelman, D.M. A dimerization motif for transmembrane a-helices. Nature Structural Biology. 1994. 1:157-163.
Lupas, A. Predicting coiled-coil regions in proteins. Current Opinion in Structural Biology. 1997. 7:388-393.
Kuntz, I.D. Structure-based strategies for drug design and discovery. Science. 1992. 257: 1078-1082.
MacKenzie, K.R., Prestegard, J.H., and Engelman, D.M. A transmembrane helix dimer: structure and implications. Science. 1997. 276:131-133.
Pawagi, A.B., Wang, J., Silverman, M., Reithmeier, R.A.F., and Deber, C.M. Transmembrane aromatic amino acid distribution in P-glycoprotein. A functional role in broad substrate specificity. J. Mol. Biol. 1994. 235:554-564.
Popot, J.-L. and Engelman, D.M. Membrane protein folding and oligomerization: the two-stage model. Biochemistry. 1990. 29:4031-4037.
Richmond, T.J. and Richards, F.M. Packing of alpha-helices: geometrical constraints and contact areas. J. Mol. Biol. 1978. 119, 537-555.
Shun-Cheng, L. and Deber, C.M. A measure of helical propensity for amino acids in membrane environments. Nature Structural Biology. 1994. 1:368-373.
Tsukihara, T., Aoyama, H., Yamashita, E., Tomizaki, T., Yamaguchi, H., Shinnzawha-Itoh, K., Nakashima, R., Yaono, R., and Yoshikawa, S. The whole structure of the 13-subunit oxidized cytochrome c oxidase at 2.8 A. Science. 1995. 269:1069-1074.
Wallace, B.A., Cascio, M., and Mielke, D.L. Evaluation of methods for the prediction of membrane protein secondary structures. Proc. Natl. Acad. Sci. USA. 1986. 83:9423-9427.
Wallin, E., Tsukihara, T., Yoshikawa, S., von Heijne, G., and Elofsson, A. Architecture of helix bundle membrane proteins: an analysis of cytochrome c oxidase from bovine mitochondria. Protein Science. 1997. 6:808-815.
Zhou, Y., Wen, J., and Bowie, J.U. A passive transmembrane helix. 1997. 4:986-990.