nn.html

Neural Networks

Neural networks have recently become a popular method for secondary structure prediction. This is true not only because of their trendiness in the Computer Science community, but also because neural nets have proven quite successful in seconday structure prediction, attaining comparable accuracy to information theory and nearest neighbor methods (ca. 65%). In principle, the design of neural networks is quite simple. Their name derives from the fact that originally they were intended to imitate the neurons of the human brain. Like brain cells, neural nets consist of central "decision making" units which are interconnected with other units with some topology, meant to imitate the axons of neurons. Note that for brevity only feed-forward networks will be described. I have been unable to find neural nets used for secondary structure prediction with other network topologies.

Figure 1 shows a simple neural network, which is designed to perform the task of determining whether the input to A is larger or smaller than input B. For simplicity, we will assume that the inputs to A and B are not processed in this case. The output module receives the sum of A and B times the weight on their connections (1 and -1 in this case, respectively). Thus, if A is larger than B, the output module receives a number greater than zero and if the converse is true, then the output receives a negative number. Since we wish to have the output return a yes (1) if A is greater than B, the simplest function for the output module to contain is a step function, y = 1, for x>0 , y= 0 for x<0. In practice, it is desirable for the function to be continuous, and so a step-like sigmoidal function, such as y = 1 / ( 1 + exp ( -x ) ), is generally used. This function is depicted in Figure 2.

Of course, the neural networks used for secondary structure prediction are much more complicated than the network in Figure 1. Their properties, however, are much the same. Each module receives a sum of the outputs of the moduels which feed into it multiplied by their weights (e.g. Xl = Sigma i=1 to n WiYl-1) where X is the input at layer l, Wi is the weight on the connection between Xl and Yi,l-1, and Yi,l-1 are the outputs of the modules of the preceeding layer. The module Xl then processes this input to generate an output Yl using the equation Yl = 1 / 1 + exp - Xl, as mentioned above.

This description of neural nets ignores their most interesting aspect, namely their ability to generalize, or learn. When one speaks of generalizing a neural net, one refers to the process of optimizing the parameters of the net on a training set, i.e. a set of inputs for which the desired output is known. For our application this means a set of protein primary sequences with corresponding secondary structure classifications. Using this set, parameters of the net are adjusted to maximize the frequency with which the algorithm correctly predicts the secondary structure of the training set. One can imagine adjusting several parameters including the weights on connections between modules, the functions used by the modules to generate their output, and the topology of the net. In practice, only the first of these is adjusted, using a gradient descent method. (Note that the use of a gradient descent method to minimize the net's error is the reason for the aformentioned use of a continuous sigmoidal switch-like function rather than a step function.)

Figure 3 shows a typical topology for a secondary structure predicting neural net. This net is adapted from Holley and Karplus, 1991. The input layer consists of of the central amino acid, of which we wish to predict the secondary structure, and the six amino acids which flank it. The input layer is actually an n x 21 array, where n is the size of the sequence window and the 21 rows correspond to each possible amino acid plus a null entry for use when the window overlaps the ends of the protein. To input a specific sequence, the input corresponding to the amino acid at a given position in the window is set to one, and all others are set to zero. Another difference from simpler nets is the incorporation of a "hidden" layer. This is necessary to allow the net to "learn" the rules of secondary structure prediction (i.e. to utilize the pairwise correlations between the amino acids in the window and the secondary structure of the central amino acid.) A final point of interest is the fact that, although the neural net predicts the state of an amino acid as one of three possibilities (coil, helix, or sheet), only two outputs are generated. One corresponds to helix and the other to sheet. If neither is above an empirically determined cutoff, coil is assigned. Otherwise the highest of the two values wins.

The network of Karplus and Holley outlined here correctly predicted 63.2% of amino acids. This closely agrees with the results of Rost and Sander for single sequences of 65% (though their result jumps to 70.8% when four homologous sequences are used, which will be discussed below) (Rost and Sander, 1993).

Next Previous