Conclusions:

The methods examined here (with the obvious exception of the Markov Model) have remarkably similar results for ungapped single sequences with fixed window sizes. All score in the vicinity of 65% Q3. Several recent advances have increased this score to a maximum of over 73%. First, the incorporation of enviormental information into scoring schemes yielded a 3-4% increase in Q3. This score is markedly increased when gapped alignments and varying window sizes are used, e.g. SSPAL's increase to 71.2%. Thus it would seem that accurate alignment allows access to a significant amount of secondary structural information contained in local sequence which is otherwise hidden. Scores are also improved by the use of multiple homologous sequences (rather than a single sequence as input). Improvements range for ~6% for Rost and Sander's neural net to 2.5% (to 73.5% total) for SSPAL. So, given a set a homologous protein sequences, one can now reasonably expect to predict the secondary structure of almost three quarters of the amino acids.

Although no one technique would seem to intrinsically outpoerform the others, the nearest neighbor method may well be the best due to its adaptability. Consider, for example, the difficulties in inputting homologous sequences into a neural net: the user is required to ad additional inputs for every sequence and retrain to net with vastly more weights to optimize. Similarly, an increase in window size requires a recomputation of the information statistics, which would become very computationally intensive for long alignments.

Future prospects may be somewhat limited for secondary structure prediction. Although no one is certain what percentage of the information about secondary structure is contained in the local structure vs. the long range tertiary interactions, it is likely that the limit is fast being approached. Regardless, it is certain that one will never be able to a priori predict the secondary structures of 100% of the amino acids in one's protein. To raise again the question put forward at the beginning of this paper, "What will we be able to do with these secondary structure predictions?" I am afraid the answer may be, "not very much." Unfortunately, even when one knows the secondary structure entirely, folding a protein is still a non-trivial problem. This is true even when (for example) the phi and psi angles of helices and sheets are known exactly (see for example, Greg Warren's work folding helical proteins from dipolar data). If one cannot recreate tertiary structures when data of this quality is simulated, imagine the difficulty when one attempts to fold a protein from the knowledge that, for example, residue 60 is75% likely to lie within a zone representing 20% of the Ramachandran plot.

Next
Previous