info.html

Information Theory: The GOR Method:

Information theory approaches are popular in secondary structure prediction, and are epitomized by the so-called GOR Method, which takes its name from the last names of the authors who originally developed it (Garnier, Osguthorpe, and Robson, 1978). The theory began as essentially a more rigorous method than calculating Chou-Fasman-like propensities for calculating the probability of a given amino acid assuming a given secondary structure element. The bridge to this probability is a function called the information: I(S , R) = ln (fS,RN/fRfS). In this equation, fS,R is the frequency of amino acid R in secondary structure S in the database, N is the number of amino acids in the database, and fR and fS are the frequencies of R and S in the database, respectively. It can be shown that the secondary structure with the highest I is the most likely one for a given amino acid. Generally, the value of I(S,R) is not used; rather a value termed I(DS,R) which is the difference in I between a given S and the sum of all other possible Ss. Also, since the amino acids which neighbor a given residue are also a source of information about its conformation, they too are used. Ideally this calculation would yield all the information from the amino acid as well as its neighboring amino acids (i.e. P(Sj, R1 ... Rn). This would require an extraordinarily large database for any reasonable window size, so the authors of the GOR method have been forced to use the approximation of just considering the information yielded by every pairwise combination of amino acids in the window. (For a rigorous discussion of the mathematics involved in the GOR method see Garnier, et al., 1996). Unfortunately, it appears that this limitation significantly impairs the accuracy of the method, which has a Q3 of 64.4%. The greater flexibility of the SSPAL nearest-neighbor method apparently allows an increase in accuracy of ~7% (see below).

Perhaps the most appealing aspect of information theory is the ability to estimate the accuracy of one's predictions with relatively high accuracy. This is true because the probabilities one determines for secondary structure assignments are actually statistically rigorous estimates of the certainty with which one can make the assignment. Given a sufficiently high number of database entries this agreement is striking. For example, in their 1991 paper, Gibrat et al. showed that 91% of the residues predicted with certainty between 90 and 100% were correct; 82% between 80 and 90%, etc. From this correlation, the authors go on to conclude that they have extracted essentially all the useful information contained in the local structure. This is clearly not the case (again see SSPAL below). Nonetheless, one can imagine the ability to correctly state the probability that one's prediction is correct (as well as the probability that the amino acid is in each of the other two secondary structure states) would prove invaluable in tertiary structure prediction, for example in the generation of an energy term to describe the conformation of the amino acid.

Next Previous