Application of Conformational Space Annealing in de novo Protein Structure Prediction

 

In 1973 Chris Anfinsen established an important principle governing protein structure: the native fold adopted by a polypeptide corresponds to its free energy minimum [1]. This basic thermodynamic principle is the foundation on which current de novo structure prediction attempts are built. Given a primary amino acid sequence and an appropriate potential energy function, the goal of de novo structure prediction is to carry out successive rounds of optimization on this potential energy function in order to arrive at the global minimum energy conformation, which corresponds, at least in theory, to the native structure. This approach, based entirely on physical principles, is fundamentally different from other structure prediction methods that employ homology modeling, threading, and statistical comparisons to known crystal structures. If successful, the strategy of searching and optimizing polypeptide conformations to determine a proteins native structure would underscore a fundamental biophysical and thermodynamic principle, provide a powerful new tool for structural biologists, address issues concerning bias in the protein data bank, and open the floodgates for a wide range of structural genomics projects.

One of the principle limitations precluding successful structure prediction for many years has been the lack of a highly sophisticated computational approach for global optimization of these potential energy functions. Very recently, Harold Scheraga and coworkers have made significant advances in this area by developing a search method known as conformational space annealing (CSA) [2]. CSA has succeeded where other methods have failed, in part because it is designed to search over extremely broad ranges of conformational space, generating numerous local minima before arriving at the global minimum free energy conformation. Therefore, the CSA searching method allows one to calculate many different groups of low-energy protein structures, one of which is presumably the native structure.

As described by Lee et al, the CSA method begins with a randomly-generated set of conformations which are energy-minimized by an appropriate algorithm (such as the Secant Unconstrained Minimization Solver) to generate an initial "bank" of conformations [3]. From the bank, one then selects a number of widely varying conformations called "seeds." The variable dihedral angles of the seeds, which dictate their three-dimensional fold, are altered in a non-random fashion to create pools of new conformations from each seed. In their applications of CSA, Lee et al generally use ~20 seeds and obtain ~30 conformations per seed, yielding ~600 total conformations for each round of CSA, each of which must be energy-minimized [3]. This computational nightmare is resolved by the use of parallel computing with as many as 100 processors.

Over the course of a CSA search one obviously generates massive quantities of information. Rather than accumulating files for each new conformation and exponentially increasing the bank size, one asks two questions for each new energy-minimized conformation. (1) Is the new conformation redundant (i.e. is it significantly similar to one of the existing seeds?) or does it represent a distinct class of local minima? (2) If the conformation is judged to be significantly similar to an existing group, CSA determines if the new conformation is of lower energy than the lowest-energy representative from that group.

The first question is addressed by determining if the distance, Dij, between two conformations, i and j, is greater than or less than some pre-defined cutoff value, Dcut [2]. Simply put, Dij represents the summation of the differences between the dihedral angles of i and j. Each new energy-minimized conformation, i, is compared to each pre-existing conformation, j, found in the bank. If Dij > Dcut, then i constitutes a new group. The lowest-energy representatives from each group within the bank, including the new i group, are then compared and the worst conformation (i.e. the highest-energy conformation) is discarded. Therefore, the size of the bank remains constant. If Dij < Dcut, then i is placed in the same group with j. If conformation i is a better conformation (i.e. of lower energy) than the lowest-energy representative from the j group, then i becomes the new representative from that group. Intuitively, it follows that larger Dcut values allow one to cover larger regions of conformational space. Therefore, CSA achieves its efficacy by beginning with very large Dcut values to essentially search all possible structures, followed by a gradual decrease of Dcut to achieve annealing [2].

Within the past three years Scheraga and colleagues have reported promising results by using CSA to successfully predict native conformations for peptides of increasing size. Initially CSA was employed with the Empirical Conformational Energy Program for Peptides (ECEPP) algorithm for the 5-residue peptide Met-enkephalin [2], followed by a slightly larger transmembrane segment of melittin (20-residues) [3]. However, more profound results were obtained within the past year when Scheraga and coworkers were successful in employing CSA for predicting the native structures of a 46-residue segment of staphylococcal protein A and the 75-residue protein calbindin D9K [4]. CSA was also used in de novo structure predictions for five globular proteins with sizes between 89-140 amino acids [5]. It is important to note that crystal structures did not exist for these five proteins prior to the prediction studies. However, crystal structures for two of the five, HDEA and MarA, were reported shortly afterwards. Backbone rmsd values for large portions of the predicted structures of HDEA and MarA were 4.2 and 6.0 , respectively, when compared to the crystal structures.

The recent progress in de novo structure prediction is due in large part to the development of the powerful method of conformational space annealing. These results, though encouraging, are just the beginning as it remains to be seen if the CSA method will be fruitful for predicting structures of larger and more biologically interesting proteins. However, the CSA method is still in its infancy and continuing advances in computer technology will likely allow further improvements in prediction accuracy and time efficiency.

 

References:

[1] Anfinsen, C.B. (1973) Science 181, 223-30.

[2] Lee, J., Scheraga, H.A., Rackovsky, S. (1997) Journal of Computational Chemistry 18, 1222-32.

[3] Lee, J., Scheraga, H.A., Rackovsky, S. (1998) Biopolymers 46, 103-15.

[4] Lee, J., Liwo, A., Scheraga, H.A. (1999) Proc. Natl. Acad. Sci. USA 96, 2025-30.

[5] Liwo, A., Lee, J., Ripoll, D.R., Pillardy, J., Scheraga, H.A. (1999) Proc. Natl. Acad. Sci. USA 96, 5482-85.