Sam Stubblefield
Prof. Mark Gerstein
MBB 452a
December 10, 1999


Comments on the Difficulty of Protein Tertiary Structure Prediction.

There currently exists no method for directly visualizing proteins, be they in solution, fixed, in an active conformation or an inactive one. We find ourselves forced to use the indirect methods of x-ray crystallography and NMR spectroscopy to gain tertiary structural information about proteins. Neither of these methods are ideal, providing structural information only after lengthy procedures and even then only for the protein’s conformation in alien conditions. Given this situation, the ability to predict protein’s tertiary structure in silico from its amino acid sequence is highly desired. Judging the outcomes of the predictions against x-ray diffraction and NMR data thus far attempts at the prediction tertiary structure have failed. When attempting to understand this lack of success, several things should be kept in mind: x-ray and NMR structures may not represent the native, active conformation of the protein, the protein likely has multiple variant conformations, and the active conformation of the protein may not be at the global potential energy minimum.

While some proteins crystallize easily, getting others to do so blurs the line between art and science, requiring a deft touch with solute and protein concentrations. In the end, after more or less work, one ends up with a crystal of the protein that hopefully leads to a solvable diffraction pattern. Massaging the protein into crystallizing necessarily forces it to change its conformation (if it could crystallize in its native conformation, one would not need to alter solvent concentrations and the like). Heavy atoms, which further distort the protein’s conformation are incorporated into proteins for crystallization to make the diffraction pattern solvable. The combination of these two facts mean that one must take x-ray crystal structures with several grains of salt and know that they probably do not represent the native conformation of the protein.

While the determination of tertiary structure through NMR spectroscopy does not require the luck and patience of crystallization of proteins, it shares with this method the placement of the protein into an alien solution for the structure determination. The overall usefulness of the procedure is also constricted by the limiting of the procedure to relatively smaller proteins and the relatively poorer resolution it provides.

Considering the above, it becomes obvious that currently there exists no positive method for determining the native conformation of a protein. Furthermore, the structures produced by x-ray diffraction and NMR spectroscopy are inherently limited in their usefulness due to the methods used in preparing the protein for data collection which prevent the protein from being in its native conformation at the time of data collection.

This has several implications for attempts at computer prediction of tertiary structure. For the time being, it seems reasonable to continue working towards a model that predicts tertiary structure in agreement with the conformations given by NMR or x-ray crystallography, as these are the best data available. In the future, however, the goal of predicting a protein’s native conformation should not be forgotten in a rush to predict the structure it assumes is a crystal at given conditions or predicting what NMR will determine the structure to be. Due to the lack of direct observability of native protein conformation, prediction of experimental properties may have to serve as a test for computer modeled native conformation predictions.

When looking at the predictions of x-ray crystallography and NMR, one can easily be drawn into the cognitive trap of thinking that the protein in question has one definite conformation, and that this is the conformation pictured. More accurately, the conformation pictured is an average of those found in the sample2. The variety of different conformations a protein assumes must be limited by covalent bonds and hydrophilic / hydrophobic considerations, but the protein can reasonably be assumed to "flop around" to some degree. It seems unlikely that the marginal change in free energy associated with every movement of the protein towards its fully folded state is so great that the folding does not proceed in the reverse direction a substantial amount of the time. Further, one can imagine the protein also folding in ways outside of its main folding pathway, leading to conformations that significantly differ from its native conformation. When attempting to predict a protein structure in silico, the ability to predict the protein as a dynamic molecule rather than a static set of coordinates stands as a substantial challenge.

Given the above, and considering the multiplicity of confirmations a protein may adopt, one may questions a central, long standing assumption in tertiary structure prediction, namely that a protein’s native conformation is that which has the lowest global potential energy. This assumption dates back nearly 20 years and is central in attempts at prediction, though it does not seem necessary. There are examples of structures or organisms which have reached local maxima in fitness both in macro and micro evolution and one can easily imagine that this is the case for some proteins as well (a local maximum in a fitness landscape is the point of greatest fitness closest to an organism’s entry into the landscape and the one toward which selective pressure pushes it, even if this is not the highest peak). Acknowledging this fact and accommodating for it in the design of tertiary conformation prediction programs may allow for greater accuracy in the conformation predictions.

The dream of complete secondary structure prediction lies well off, and that of tertiary prediction much further than that. However, en route to it, one must note that there exists no way to determine a protein’s native conformation and that current methods at determining structure are inherently limited, so the standards by which the predicted models are being judged are flawed. Furthermore, those same predictions must consider the various possible conformations and acknowledge that the most active native one may not be that with the lowest global potential energy.




1 Gerstein, Mark. "Bioinformatics: Sequences." New Haven: Lectures given at Yale University, November 1999.
2 Krause, Kurt. "Protein structure determination through crystallographic methods." Houston: Lectures given at the University of Houston, June 1998.
3 Clore, G.M. and Gronenborn, A.M. "Determination of Structures of Larger Proteins in Solution by Three- and Four-dimensional Heteronuclear Magnetic Resonance Spectroscopy." NMR of Proteins. G.M. Clore and A.M. Gronenborn eds. Ann Arbor: CRC Press, 1992.
4 Levitt, Michael. "Computer Studies of Protein Molecules." Protein Folding. R. Jaenicke ed. New York: North-Holland Biomedical Press, 1980.
5 Futuyma, Douglas J. Evolutionary Biology Ð third edition. Massachusetts: Sinauer Associates, Inc., 1997.